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INFERENCE  ON  PARAMETER  SETS  IN  ECONOMETRIC  MODELS* 

VICTOR  CHERNOZHUKOVt   HAN  HONGs    ELIE  TAMER* 

Abstract.  This  paper  provides  confidence  regions  for  minima  of  an  econometric  criterion  function 
Q(9).  The  minima  form  a  set  of  parameters,  0/,  called  the  identified  set.  In  economic  applications, 
0/  represents  a  class  of  economic  models  that  are  consistent  with  the  data.  Our  inference  proce- 
dures are  criterion  function  based  and  so  our  confidence  regions,  which  cover  O;  with  a  prespecified 
probability,  are  appropriate  level  sets  of  Qn{8),  the  sample  analog  of  Q(9).  When  0/  is  a  singleton, 
our  confidence  sets  reduce  to  the  conventional  confidence  regions  based  on  inverting  the  likelihood  or 
other  criterion  functions.  We  show  that  our  procedure  is  valid  under  general  yet  simple  conditions, 
and  we  provide  feasible  resampling  procedure  for  implementing  the  approach  in  practice.  We  then 
show  that  these  general  conditions  hold  in  a  wide  class  of  parametric  econometric  models.  In  order 
to  verify  the  conditions,  we  develop  methods  of  analyzing  the  asymptotic  behavior  of  econometric 
criterion  functions  under  set  identification  and  also  characterize  the  rates  of  convergence  of  the  con- 
fidence regions  to  the  identified  set.  We  apply  our  methods  to  regressions  with  interval  data  and 
set  identified  method  of  moments  problems.  We  illustrate  our  methods  in  an  empirical  Monte  Carlo 
study  based  on  Current  Population  Survey  data. 
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1.  Introduction  and  Motivation 

Parameters  of  interest  in  econometric  models  can  be  defined  as  those  parameter  vectors  that 
minimize  a  population  objective  or  criterion  function.  If  this  criterion  function  is  minimized  uniquely 
at  a  particular  parameter  vector,  then  one  can  obtain  valid  confidence  regions  (or  intervals)  for  this 
parameter  using  a  sample  analog  of  this  function.  Likelihood  and  method  of  moments  procedures  are 
two  commonly  used  and  well  studied  methods  in  this  setting.  This  paper  extends  this  criterion-based 
inference  to  econometric  models  that  are  set  identified,  i.e.,  models  where  the  objective  function  is 
minimized  on  a  set  of  parameters,  the  identified  set.  Our  goal  is  to  make  inferences  directly  on  the 
identified  set  and  to  provide  a  method  of  obtaining  confidence  regions  with  good  properties  (such  as 
consistency  and  equivariance)  that  cover  the  identified  set  with  a  prespecified  probability. 

Our  point  of  departure  is  a  nonnegative  population  criterion  function  Q(8)  and  its  finite  sample 
analog  Qn{9)  where  8  G  6  C  Rd.  The  identified  set  can  be  defined  as  6/  =  {8  £  Q  :  Q{9)  =  0} 
where  every  9  in  0/  indexes  an  economic  model  that  is  consistent  with  the  data.  The  objective  of 
this  paper  is  to  construct  confidence  sets  Cn  for  6/  from  the  level  sets  of  Qn{9)  such  that 

(1.1)  lim  P(Gj  C  Cn)  =  a, 

for  a  prespecified  confidence  level  a  £  (0, 1).  A  level  set  Cn(c)  of  the  finite  sample  criterion  function 
Qn  is  defined  as 

Cn(c)  ■=  \  8  ■  Qn(8)  -  qn<  c/an  \,  where  qn  :=  inf  Qn(9)  or  gn  :=  0, 

for  some  appropriate  normalization  an.  In  order  to  obtain  the  correct  coverage  (1.1),  we  choose 
Cn  =  Cn(c)  with  the  cut-off  level  c  =  ca,  where  ca  equals  the  asymptotic  a-quantile  of  the  coverage 
statistic: 

Cn  :=  sup  an(Qn{9)  -  qn), 
eeQi 

which  is  a  quasi-likelihood-ratio  type  quantity.  The  constructed  confidence  sets  possess  important 
properties,  such  as  consistency  and  equivariance  to  reparameterization.  Our  approach  covers  general 
situations,  where  the  identified  set  is  defined  as  the  minimand  of  an  objective  function.  In  addition, 
our  confidence  regions  are  sets  that  are  robust  to  the  failure  of  point-identifying  assumptions  in  that 
they  cover  the  unknown  identified  set  -  whether  it  is  a  set  or  a  point  -  with  a  prespecified  probability. 
These  confidence  sets  collapse  to  the  usual  confidence  intervals  based  on  the  likelihood  ratio  in  cases 
where  the  identified  set  0/  is  a  singleton. 

We  show  that  the  above  level  set  Cn{ca)  has  correct  asymptotic  coverage,  where  ca  is  the  ap- 
propriate quantile  of  a  well  defined  coverage  statistic  C,  the  nondegenerate  large  sample  limit  of 

Cn.   Then,  we  provide  general  resampling  methods  to  consistently  estimate  the  percentile  cQ.   The 
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coverage  and  resampling  results  hold  under  general  (high  level)  conditions.  Focusing  on  a  class  of 
econometric  models,  we  then  show  that  these  conditions  are  satisfied.  In  the  process,  we  provide  char- 
acterizations of  the  stochastic  properties  of  the  criterion  functions  process,  an(Qn(-)  —  qn),  exploiting 
the  fact  that  the  population  criterion  Q  is  minimized  on  a  set  rather  than  a  point.  Furthermore,  we 
obtain  the  asymptotics  of  the  coverage  statistic  Cn  and  of  the  level  sets  Cn(c),  focusing  on  coverage 
and  speed  of  convergence  of  Cn{c)  to  0/.  The  paper  then  considers  two  important  applications: 
regression  with  interval  censored  outcomes  and  set-identified  generalized  method-of-moments.  We 
finally  illustrate  our  methods  in  a  Monte  Carlo  study  based  on  data  from  the  Current  Population 
Survey. 

In  the  last  decade,  a  growing  body  of  literature  has  considered  the  problem  of  inference  in 
partially  identified  models,  i.e.,  models  where  parameters  of  interest  are  set  identified,  cf.  Manski 
(2003).  While  most  of  the  work,  e.g.  Horowitz  and  Manski  (1998)  and  Imbens  and  Manski  (2004), 
exclusively  focuses  on  the  case  where  there  is  a  scalar  parameter  of  interest  that  lies  in  an  interval, 
this  paper  is  concerned  with  inference  on  vector  parameters  in  problems  where  the  identified  set 
is  defined  via  a  general  optimization  problem,  as  in  the  economic  problems  described  below.  This 
multivariate  set  is  usually  not  an  interval  (or  cube)  and  can  be  a  set  of  isolated  points  or  manifolds. 
Moreover,  the  methods  used  in  the  interval  case  are  based  on  estimating  the  two  end-points  of 
the  interval.  Hence  they  are  not  applicable  in  the  general  set-identified  case,  for  which  a  different 
approach  is  needed. 

The  main  motivation  for  posing  the  identified  set  as  the  object  of  inference  is  motivated  by  many 
examples.  In  Hansen,  Heaton,  and  Luttmer  (1995),  the  identified  set  is  a  subset  of  asset-pricing 
models  that  obey  the  pricing-error  and  volatility  constraints  implicit  in  asset  market  returns.  In 
models  with  multiple  equilibria,  the  identified  set  is  the  set  of  parameters  that  describe  different 
equilibria  supported  across  markets  or  industries.  There  is  no  single  "true"  equilibrium  that  is 
played,  since  particular  equilibria  may  vary  across  observational  units;  see  Ciliberto  and  Tamer 
(2003).  Another  example  is  the  structural  instrumental  variable  estimation  of  returns  to  schooling. 
Suppose  that  we  are  interested  in  the  following  example  where  potential  income  Y  is  related  to 
education  E  through  a  flexible,  quadratic  functional  form,  Y  =  Qo  +  ct\E  +  a^E2  +  e.  Although 
parsimonious,  this  simple  model  is  not  point-identified  in  the  presence  of  the  standard  quarter- 
of-birth  instrument  Z  suggested  in  Angrist  and  Krueger  (1992)  (indicator  of  the  first  quarter  of 
birth).1  In  the  absence  of  point  identification,  all  of  the  parameter  values  (ao,  Qi,  02)  consistent  with 
the  instrumental  orthogonality  restriction  E(Y  —  qq  +  ot\E  +  ct2E2)(l,  Z)1  =  0  are  of  interest  for 


In  some  cases  the  indicators  of  other  quarters  of  birth  are  used  as  instruments.  However,  these  instruments  are  not 
correlated  with  education  (correlation  is  extremely  small)  and  thus  bring  no  additional  identification  information. 
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purposes  of  economic  analysis.  Similar  partial  identification  problems  arise  in  nonlinear  moment  and 
instrumental  variables  problems,  see  e.g.  Demidenko  (2000)  and  Chernozhukov  and  Hansen  (2001). 

Literature.  To  our  knowledge,  the  earliest  work  in  econometrics  on  parametric  set  identified 
models  can  be  found  in  the  paper  of  Marschak  and  Andrews  (1944).  There,  the  identified  set  is 
the  collection  of  parameters  representing  different  production  functions  that  are  consistent  with 
the  data  and  functional  restrictions  the  authors  consider.2  Gisltein  and  Learner  (1983)  provide  set 
consistent  estimation  in  a  class  of  likelihood  models  where  0/  as  the  set  of  parameters  that  are 
robust  to  misspecification.  Klepper  and  Learner  (1984)  generalize  the  Frish  bounds  to  multivariate 
regression  models  with  measurement  errors.  On  the  other  hand,  Hansen,  Heaton,  and  Luttmer  (1995) 
provide  consistent  set  estimates  of  means  and  standard  deviations  in  a  class  of  asset  pricing  methods. 
Manski  and  Tamer  (2002)  provided  conditions  under  which  an  appropriately  defined  set  consistently 
estimates  the  identified  set.  However,  these  consistency  results  do  not  contain  a  method  for  inference 
about  the  identified  set.  To  the  best  of  our  knowledge,  there  are  no  results  in  the  literature  that  deal 
with  the  general  problem  of  obtaining  confidence  regions  for  parameter  sets.3 

The  remainder  of  the  paper  is  organized  as  follows.  Section  2  provides  a  general  theory  of 
inference  on  the  identified  sets  based  on  general  criterion  functions.  Section  3  focuses  on  a  class  of 
regular  parametric  models,  provides  verification  of  the  regularity  conditions  posed  in  Section  2,  and 
provides  additional  results  that  pertain  to  regular  cases.  Section  4  provides  a  Monte  Carlo  evaluation 
of  the  methods,  and  Section  5  concludes. 

2.  General  Set  Inference  in  Large  Samples 

2.1.  Generic  Inference  on  Identified  Sets.  In  this  section  we  present  our  main  result  which 
forms  the  basis  for  the  rest  of  the  analysis.  We  first  define  the  identified  set  0/  and  formalize  some 
definitions  that  will  be  used  throughout. 

For  given  data,  the  inference  about  the  parameter  set  0/  is  based  on  a  criterion  function  Qn{6)  = 
Qn(9,  W\,...,  Wn),  where  data  {W\, ...,  Wn}  are  defined  on  some  common  probability  space  (Cl,F,P). 
The  criterion  function  Qn{6)  converges  to  a  continuous  criterion  function  Q(8),  that  is  minimized  at 
0/,  the  identified  set. 

Assumption  A.l  (Basic  Setup).   Criterions  Qn  :  Rd  — >  M+  and  Q  :  Rd  — >  R+  and  0  satisfy 


For  a  good  description  of  Marschak  and  Andrews  (1944),  see  Chapter  III  of  Nerlove  (1965). 

The  only  other  paper  known  to  us  is  Imbens  and  Manski  (2004).  Their  paper  considers  a  different  problem  of 
inference  about  the  a  real  parameter  9"  that  is  interval-identified  (i.e.  contained  between  some  upper  and  lower  bounds 
that  can  be  estimated.)  The  problem  of  inference  about  6'  is  fundamentally  different  from  inference  about  9;,  as 
shown  in  Imbens  and  Manski  (2004)  and  in  Appendix  G. 
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i.  0  is  a  compact  and  convex  subset  ofRd  [CONVEXITY  to  BE  RELAXED], 

ii.  Q{9)  is  continuous  and  Qn(8)  is  lower-semicontinuous, 

iii.  0/  =  argmmgeQ[Q(6)}  is  a  finite  union  of  connected  compact  subsets  ofQ, 

iv.  Q{Oi)  =  0  for  each  8;  S  0/, 

v.  Qn(0)  -  Q(9)  =  op(l)  for  each  9eQ. 

Assumption  A.l  states  a  standard  compactness  and  convexity  assumption,  which  are  important 
to  the  subsequent  analysis.  It  also  defines  0/  as  the  minimizer  of  the  limit  criterion  function  Q. 
The  region  0/  is  a  finite  union  of  compact  connected  sets,  an  assumption  that  serves  to  organize 
the  presentation.  This  covers  both  the  case  when  0/  is  a  finite  union  of  isolated  points  and  the 
case  when  0/  is  a  finite  union  of  compact  sets  with  boundaries  defined  by  manifolds  (nonlinear 
hyperplanes).  The  assumption  that  Q(9j)  =  0  serves  as  a  convenient  normalization.  The  pointwise 
convergence  condition  serves  to  relate  Q  as  the  limit  of  the  finite  sample  objective  function.  The 
pointwise  convergence  will  be  strengthened  later  on. 

Lemma  2.1  shows  how  to  construct  a  level  set  of  the  sample  objective  function  that  will  eventually 
provide  the  proper  inferential  statement  (1.1)  about  0/  in  a  generic  setting.  The  c-level  set  of 
objective  function  Qn  is  given  by 

Cn(c)  :=  I  9  :  an  (Qn{9)  -  qn)  <  c\ ,  where  qn  :=  inf  Qn{9)  or  qn  :=  0, 

where  an  is  defined  in  Assumption  A. 2  below.    Typically  an  equals  n  in  the  regular  cases  studied 

later.   Note  that  choosing  qn  =  infgge  Qn{9)  guarantees  that  the  confidence  region  Cn{c)  is  always 

non-empty,  though  in  some  cases  one  has  inf^ge  Qn{9)  =  0  with  probability  converging  to  one,  see 

e.g.  Example  1  in  Section  3. 

Consider  the  following  coverage  index 

(o  I)  p(c)  =     c-  sup  an  {Qn{9)  -  qn)  ■ 

v  '  ;  eee, 

The  sign  of  the  index  p(c)  indicates  whether  0/  C  Cn(c)  or  not.    For  example,  if  there  is  9  S  0/ 

such  that  9  $.  Cn(c),  we  have  an  (Qn{9)  —  Qn)  >  c  which  implies  that  p(c)  <  0,  and  vice  versa.  The 

index  is  also  linear  in  c,  which  will  allow  us  to  have  data-dependent  cut-off  levels  c.  Another  desirable 

property  is  invariance  of  the  index  p  to  parameter  transformations  which  implies  equi variance  of  Cn(c) 

to  parameter  transformations.    For  instance,  a  one-to-one  transformation  of  parameters  9  — >  t{9) 

changes  the  level  set  in  the  equivariant  way 

{t(9)  :  an  (Q^r-1^)))  -  qn)  <  c}  =  r  (Cn(c))  . 

The  following  lemma  summarizes  the  discussion. 


For  definition  of  manifold,  see  e.g.  Milnor  (1964) 


Lemma  2.1.  Let  Assumptions  A.l-(ii)  and  A.l-(iii)  hold.  Then,  I.  the  coverage  property  holds: 
p(c)  <  0  •£=>  0/  ^  Cn{c)  and  p  (c)  >  0  «=>  6/  C  Cn(c);  II.  the  coverage  index  is  linear  in  the 
cut-off  level;  HI.  the  coverage  index  is  invariant  to  re-parameterization;  and  IV.  the  level  sets  are 
equivariant  to  bijective  parameter  transformations. 

Next  we  state  the  main  condition  that  enables  large  sample  inference  on  0/. 

Assumption  A. 2  (Coverage  Statistic).  Suppose  that  there  exist  a  sequence  of  constants  an  — >  oo 
such  that 

Cn  =   sup  an  (Qn(0i)  -  qn)  -^d   C, 

where  C  is  a  nondegenerate  random  variable. 

In  the  sequel  we  explain  how  this  main  assumption  is  attained  in  sufficiently  regular  models  of 
interest,  where  an  =  n.  We  also  provide  methods  for  verification  of  this  assumption  and  for  finding 
the  limit  variable  C.  Verification  of  Assumption  A. 2  is  a  difficult  matter  and  requires  developing  a 
set  of  new  asymptotic  methods  of  dealing  with  criterion  functions  under  set  identification. 

Assumption  A. 2  leads  us  to  the  following  main  theorem,  which  provides  a  generic  result  on 
inference  in  set-identified  models. 

Theorem  2.1  (Generic  Inference  on  Sets).  Suppose  Assumptions  A.l  and  A. 2  hold  and  that  ca  is  a 
continuity  point  of  distribution  function  of  C  such  that  P{C  <  ca}  =  a.  Then  for  any  ca  — >p    ca, 

I.       p  {Cn(ca))  ->d   ca-C  and   II.         lim  P\  0/  C  Cn(ca)  \  =  a. 

n— >oo        I.  ) 

The  result  follows  immediately  from  Lemma  2.1  and  A. 2.  First,  p{Cn(c))  =  C„  —  ca  =  Cn  -  ca  + 
op(l)  — >d   C  —  ca.  Second, 

P{er  C  Cn(ca)]  =  p{p(Cn(ca))  >  o}  =  P{ca  -  Cn  >  o} 

=  P{Cn  <Ca  +  Op(l)}  =  P{C  <  CQ}  +  o(l), 

provided  ca  a  continuity  point  of  distribution  function  of  C. 

It  is  clear  from  Theorem  2.1  that  our  method  of  constructing  valid  confidence  sets  builds  on  the 
classical  principle  of  inverting  some  criterion  function.  Indeed,  in  point  identified  cases  0/  reduces 
to  a  singleton  {6j}  so  that  the  coverage  statistic  becomes  a  standard  likelihood  ratio  type  quantity 
Cn  =  a,n  (Qn(8l)  —  Qn) ,  which  follows  well  known  limit  laws.  In  the  general  case,  the  statistic  is  a  more 
involved  quantity,  being  the  supremum  over  the  elements  of  0/:  Cn  =  suP6>,ee/  an  {Qn{@l)  "  Qn)  ■ 

In  practice,  our  procedure  for  constructing  the  confidence  regions  critically  depends  on  being 

able  to  consistently  estimate  ca,  the  a-quantile  of  the  limit  variable  C.  In  all  examples  that  we  study, 
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C  is  non-standard  and  its  distribution  depends  on  0/.  Despite  this  problem,  we  will  show  how  to 
obtain  a  consistent  estimate  of  ca  by  the  following  method. 

2.2.  Feasible  Inference.  We  first  need  to  obtain  an  approximation  to  the  sampling  distribution  of 
Cn  =  sup060;  an  (Qn(0)  —  qn)  ■  Since  we  do  not  observe  0/,  we  will  replace  it  by  an  initial  estimate' 

(2.2)  0/  =  Cn(k),  where  k  S  [ci,C2J  •  Inn      wp  — »  1, 

where  A;  is  a  possibly  data-dependent  starting  value  of  the  cut-off.  (In  the  first  class  of  our  examples, 
we  can  use  th  starting  cut-off  k  =  0.)  The  result  proven  below  suggests  that  the  asymptotic  validity 
of  the  procedure  will  not  depend  on  the  starting  value.  In  finite  samples,  the  choice  of  the  starting 
value  may  be  important,  and  we  discuss  it  in  Section  4. 

Consider  the  following  subsampling  algorithm: 

1.  For  cases  when  data  {Wi}  are  iid,  construct  all  (Bn)  subsets  of  size  b  -C  n  of  the  data.  For 
cases  when  {Wt}  form  a  stationary  time  series,  construct  Bn  =  n  —  b  +  1  subsets  of  size  b 
of  the  form  {Wi, ...,  Wi+b-\}-  (In  practice,  one  can  use  a  smaller  number  Bn  of  randomly 
chosen  subsets  under  the  condition  that  Bn  — »  oo  as  n  — >  oo.) 

2.  For  each  j  =  1, ...,  Bn,  compute 

Cj,b,n  =     sup    ab  {Qj,b{Q)  ~  Qj,b) , 
eeCn(c) 

where  qjfi  '■=  0  if  qn  :=  0  and  qj^  :=  inf^ee  Qjtb{8)  otherwise;  Qjtb(8)  denotes  the  criterion 

function  defined  using  the  j-th  subset  of  the  data  only. 

3.  Let  ca  be  the  a-th  quantile  of  the  sample  {Cj^^ij  =  1,  ■■•,  -Bn}. 

4.  (Optional)  As  commented  in  Section  4.3,  one  could  repeat  steps  2  and  3  finite  number  of 
times  using  c  =  calnn.6  (Any  finite  number  of  repetitions  produces  the  consistent  estimate 
of  cQ.)7 

We  require  that  as  n  — ►  oo, 

(2.3)  b/n  — >  0,    Bn  — >  oo,    b  — >  oo     at  polynomial  rates. 

The  choice  of  b  and  other  practical  aspects  of  the  procedure  are  discussed  in  Section  4. 

To  guarantee  asymptotic  validity  of  the  above  procedure,  the  following  assumption  is  needed. 


The  adjustment  factor  Inn  can  be  replaced  by  In  Inn  or  any  other  m(n)  such  that  m(n)  — >  oo  and  m(n)/n'1  —>  0 
as  n  — >  oo  for  all  a  >  0. 

The  adjustment  by  In  n  is  not  needed  in  the  first  class  of  our  examples. 

In  fact,  we  found  in  Monte-Carlo  experiments  described  in  Section  4  that  just  two  repetitions  worked  the  best  in 
terms  of  coverage  and  computational  expense.  This  was  also  confirmed  by  Bajari,  Benkard,  Levin  (2003). 
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Assumption  A. 3  (Sandwich  Property).  Suppose  that  b/n  — ■>  0  and  b  — »  oo  at  polynomial  rates  as 
n  — >  oo.  for  fc  G  [co,  ci]  -Inn,  tup  — >  1 

Qi  C  C„(&)  C  Gj\  sucft  i/iaf     ab(  sup  Qi(^)  -  sup  <3b(#))  =  op(l), 

6>ee<n  see, 

©/"  :~  {^  +  *  ■  11*11  —  en^j  6  0/}  and  en  — >  0  is  a  sequence  of  positive  constants. 

Assumption  A. 3  guarantees  inferential  validity  but  also  allows  us  to  establish  consistency  and  to 
characterize  the  rate  of  convergence  of  the  level  sets  Cn{k)  to  the  identified  set  0/,  as  shown  below 
in  Theorem  2.2.  The  intuition  behind  A. 3  is  as  follows.  In  the  subsampling  bootstrap,  we  do  not 
know  0/,  hence  we  replace  it  by  an  estimate  Cn{c).  The  replacement  should  have  only  a  negligible 
impact  on  the  distribution  of  the  coverage  statistic  in  subsamples.  A. 3  is  similar  in  nature  to  the 
polynomial  rate  of  convergence  assumption  used  by  Politis,  Romano,  and  Wolf  (1999),  p  44,  in  the 
case  of  Wald  inference  in  point  identified  cases.  There,  one  does  not  know  the  true  9j  and  replaces 
it  with  an  estimate  Qj,  hence  requiring  that  this  replacement  has  negligible  effect  on  the  distribution 
of  the  Wald  statistic  in  subsamples.  We  show  how  to  verify  A. 3  for  parametric  models  in  Section 
3  (Theorem  3.2).  The  next  theorem  summarizes  our  results  for  general  inference  in  set  identified 
models. 


Theorem  2.2  (Consistency  and  Validity  of  Inference).  Suppose  that  the  estimation  data  {Wi,i  <  n) 
are  iid  or  form  a  stationary  strongly-mixing  sequence  and  that  A. 2  and  A. 3  hold. 

I.   Then  for  the  subsampling  algorithm  defined  above,  provided  P{C  <  c]  is  continuous  at  c  =  ca, 

P{Qi&Cn{ca))-^a. 
II.   The  set  Cn(k)  for  k  =  ca\nn  is  consistent  under  the  Hausdorff  metric:  wp  — >  1 

dH(Cn{k),QI)<en^O. 

Recall  that  the  Hausdorff  metric  between  two  sets  is  defined  as: 

dH(A,B):=max.[h(A,B)>h(B,A)],     where     h(A,  B)  :=  sup  inf  \\a  -  &||. 

a€A>>eB 

Thus,  under  general  conditions  our  inference  method  is  asymptotically  valid  and  delivers  level  sets 
Cn(k)  that  converge  to  the  identified  set  at  the  rate  en  with  respect  to  the  Hausdorff  distance  (the 
rate  en  will  be  shown  to  be  essentially  l/\/n  for  parametric  models). 

Under  the  given  level  of  generality,  subsampling  appears  to  be  the  only  valid  resampling  method. 
The  conventional  (n-out-of-n)  bootstrap  will  not  be  generically  consistent  in  the  present  settings. 
One  counterexample  is  as  follows:  In  Section  3,  one  of  the  leading  examples,  the  partially  identified 


linear  IV  regression,  necessarily  involves  a  parameter  on  the  boundary  problem.  It  is  known  that  the 
bootstrap  fails  in  parameter-on-the  boundary  problems,  cf.  Andrews  (2000). 

3.  Regular  Parametric  Case  and  Applications 

In  this  section  we  establish  the  methods  for  verifying  the  main  conditions  required  for  implement- 
ing the  approach,  namely  that  existence  of  the  limit  distribution  for  the  coverage  statistic  (A. 2)  and 
the  sandwich  condition  (A. 3).  We  develop  these  methods  to  cover  a  variety  of  parametric  examples, 
and  illustrate  the  the  approach  with  applications  to  regression  models  with  missing  outcome  data 
and  generalized  method  of  moments  under  partial  identification.  For  parametric  models,  we  establish 
further  properties  of  the  confidence  regions,  such  as  the  speed  of  convergence  of  the  level  sets  to  the 
identified  set,  and  the  stochastic  properties  of  objective  functions  in  partially  identified  cases. 

3.1.  Examples  of  Parametric  Problems  with  Set  Identification.  Example  I.  Regression 
with  Interval-Censored  Outcomes. 

Consider  the  linear  conditional  expectation  models 

Ep0[Y\X)  =  X'O,  where  8  £  Q,X  e  Rd. 

The  models  are  not  assumed  to  be  correctly  specified,  in  the  sense  that  there  may  be  no  8  such  that 
i?,.  [Y|X]  agrees  with  i?/,[y |X]  under  the  actual  law  P  of  the  data. 

Observed  data  consists  of  i.i.d.  observations  (Yu,  Yu,  Xt)  where  Yu  and  Y<n  represents  the  interval 
observation  on  Y{\ 

(3.1)  Yi  €[YiuYx]  given  X{,  a.s. 

In  the  absence  of  further  information,  the  set 

(3.2)  Qi  =  {9eRd:  E[Yi\X]  <  X'9  <  E\Y2\X)  a.s.  } 

is  the  object  of  interest,  as  it  represents  the  set  of  linear  conditional  expectation  models  that  are 
consistent  with  data.  We  assume  that  9/  C  0,  where  0  is  a  compact  subset  of  Rd.  Observe  that  0/ 
minimizes  the  objective  function 

(3.3)  Q(8)  =  J  {(E^x]  -  x'8)2+  +  {E[Y2\x}  -  x'8)2_]  dP(x), 

where  (u)\  =  (u)2  x  l[u  >  0]  and  (u)2_  =  {uf  x  l[u  <  0].  Notice  also  that  0/  =  {8  €  0  :  Q{8)  =  0}  . 
Using  a  sample  analog  of  (3.3), 

(3.4)  Qn(6)  =  \  £  (EiYulXi]  -  Xief  +  (E[Y2i\Xi]  -  X'^  , 

i<n 


Manski  and  Tamer  (2002)  characterize  consistent  estimates  of  0/.8  However,  no  method  for  infer- 
ence about  0/  in  the  sense  of  providing  the  inferential  statements  was  given.  We  provide  below  a 
confidence  approach  to  inference  about  0/  in  this  and  similar  examples. 

Example  II.  Structural  Moment  Equations.  In  method  of  moments  settings,  we  are  in- 
terested in  deducing  the  set  of  all  economic  models  indexed  by  a  parameter  9  £  Rd  that  satisfy  the 
moment  equation  computed  with  respect  to  the  probability  sampling  distribution  of  the  economic 
data.  The  data  (Xi,i  <  n)  are  stationary  and  strongly  mixing,  defined  on  the  probability  space 
(CI,  T,  P).  The  economic  models  9j  of  interest  are  assumed  to  satisfy 

(3.5)  E[mi{6i)]  =  0, 

where  rrii(9)  =  m(9,  X{)  is  a  lower-semi-continuous  function  in  9  a.s.  The  entire  set  of  models  0/  C  0 
that  solve  (3.5)  also  minimize  the  criterion  function 

(3.6)  Q(9)  =  E[ml(9)}'W(9)E[mi(9)}, 

where  Q(9)  is  continuous  for  each  9  6  0,  a  compact  subset  of  Rd,  and  W(9)  is  a  continuous  and 
positive  definite  matrix  for  each  9  S  0.  In  nonlinear  models,  the  existence  of  multiple  solutions 
to  nonlinear  equations  (3.5)  is  more  of  a  rule  rather  than  an  exception,  except  in  special  cases,  see 
e.g.  Demidenko  (2000).  In  linear  models  there  may  be  multiple  solutions  as  well  if  the  usual  rank 
conditions  fail  to  hold,  as  mentioned  in  the  introduction.  Thus,  the  GMM  function  (3.6)  will  in 
general  be  minimized  on  a  set.  The  inference  on  0/  may  be  based  on  the  usual  GMM  function 

(3.7)  Qn(0)  =  n[gn(9)]'Wn(9)[gn(e)},    gn{9)  =  -V 771,(0), 

n  * — ' 
i=i 

where  Wn(9)  is  a  lower-semi-continuous,  uniformly  positive  definite  matrix.9    To  the  best  of  our 

knowledge,  no  methods  for  inference  about  0/  in  the  sense  of  providing  Neyman-Pearson  confidence 

statements  has  been  yet  given  in  such  settings. 

3.2.  General  Asymptotics  of  Criterion  Functions  in  Parametric  Models.  The  prime  goal 
of  this  section  is  to  provide  primitive  conditions  and  tools  that  help  verify  conditions  A. 2  and  A. 3  in 
a  wide  variety  of  parametric  models.  The  tools  will  be  illustrated  using  the  examples  stated  in  the 
next  section. 


Specifically,  they  show  that  the  set  0„  =  {6  £  ©  :  Qn{0)  <  mine  Qn{8)  +  £„}  converges  almost  surely  to  the  set  0; 
if  £n  >  0  converges  to  zero  at  a  rate  strictly  slower  than  the  rate  at  which  Qn(-)  converges  uniformly  to  Q().  They  use 
a  Hausdorff  metric  as  a  distance  function  between  sets. 

For  example,  it  is  convenient  to  use  the  continuous  updating  type  weight  matrix  Wn(0),  i.e.  to  let  Wn(6)  equal  a 
consistent  estimate  of  asymptotic  variance  of  n-1'2  5Z"=i  mi(&)- 
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Since  Qn(9)  approaches  Q{0),  we  should  be  able  to  tell  that  9  0  0/  if  8  is  outside  some  neigh- 
borhood of  6/.  The  size  of  this  neighborhood  depends  on  the  rate  at  which  the  boundary  of  0/  can 
be  learned.  In  the  remainder  of  the  paper,  the  rate  we  consider  is  the  parametric  rate  and  hence  the 
points  of  uncertainty  are  those  9i  that  are  within  a  1/ '-y/n  neighborhood  of  the  true  set  0/.  Such 
points  are  of  the  form 

9j  +  X/y/n,  for  0/  6  0/,Ag  Rd. 

In  order  to  keep  track  of  such  points  it  will  suffice  to  record  all  pairs  of  the  form  (9j,X).  Hence  of 
central  interest  is  the  local  empirical  process 


(0j,A)~/B(0/,A):=n(g„(0/  +  AA/n)-Q(ff/))1 


over  a  suitable  domain.  The  pertinent  domain  for  8j  will  be  shown  to  be  the  boundary  of  identified 
set  dQj .  Given  9j  G  dQj,  the  pertinent  domain  for  A  is 

Vn(9i)  :=  {A  €  Rd  :  9i  +  A/Vn7  G  0,  for  all  ri  G  [n,oo)}. 

In  addition,  the  following  limit  version  of  Vn(9])  will  play  an  important  role: 

V00(9I)  :=  {A  G  M.d  :  6j  +  X/y/n  G  0  for  all  sufficiently  large  n}, 
(3.8) 

where  when  9j  G  int(0),      V^di)  =  K. 

Thus,  Voa{9j)  plays  the  role  of  the  limit  local  parameter  space  relative  to  9j.  When  9j  is  in  the 
interior  of  the  parameter  space,  i.e.  9i  G  int(0),  V„(6j)  =  Rd.  When  9i  is  on  the  boundary  of  the 
parameter  space,  i.e.  9j  G  dO,  the  local  deviations  A  should  be  constrained  to  the  local  parameter 
space  Voa(9j)  of  the  specified  form.  This  situation  is  similar  to  the  one  arising  in  the  point-identified 
case,  as  characterized  in  Andrews  (1999).  Unlike  in  the  point-identified  case,  the  boundary  problem 
in  the  partially  identified  models  is  more  of  a  rule  rather  than  an  exception.  It  arises  even  in  the 
simplest  leading  cases,  as  will  be  seen  in  Section  3.3. 

Another  important  set  is  the  subset  of  the  local  parameter  space  Vn(9j)  where  the  local  deviations 
A  are  constrained  to  be  towards  the  interior  of  the  identified  set: 

A„(0/)  :=  {0}  U  {A  G  Rd  :  9]  +  X/Vn*  G  int(0/)  for  all  ri  G  [n,  oo)}. 

In  addition,  the  following  limit  version  of  An(9j)  will  play  an  important  role: 

Aoo(0/)  :=  {0}  U  {A  G  Rd  :  6j  +  X/^n  G  int(0/)    for  all  sufficiently  large   n}. 

Note  that  since  int(0/)  C  0,  A.n(9j)  is  a  subset  of  Vn(9i),  which  is  a  subset  of  K»(0r),  and  Aoo(0/) 

is  also  a  subset  of  V^{8i). 

n 


Given  above  definitions,  we  shall  obtain  the  following  representations  of  the  coverage  statistic 
C„    :=    sup  n(Qn{8i)  -  qn) 

_\    suP07ee7   4(0/, 0)  if     qn  :=  0, 

\  sup9/ee;  in(6i,0)  -  inf^+A/^ee  4(0/,  A)      if     g„  :=  infeeQ  Q„(0), 

=  f  suPe/6se;,A6An(e;)  4(0/,  A) +  op(l)  ifngn^p0, 

{  suPe7eaej,AeA„(0j)  4(0/,  A)  -infe76QjiA6Ki(e7)  4(0/,  A) . +  op(l)      if  nqn  Ap  o. 

In  this  representation,  the  first  equality  follows  by  definition  of  the  local  empirical  process  4(0,  A), 
while  the  second  equality  emerges  from  the  analysis  given  in  the  appendix.  To  guarantee  non- 
degenerate  asymptotics  for  this  statistic,  we  will  require  that  4(0/,  A)  converges  to  a  sufficiently 
well-behaved  random  element  £oo(9j,  A),  in  the  finite-dimensional  sense  coupled  with  some  uniformity, 
which  we  will  call  the  quasi-uniform  convergence.  This  form  of  convergence  will  be  sufficient  to 
establish  that 

c   _^    c.=  f  suPe76senA-6Aoc(e/)  4o(0/,A),  ifngn->p0, 

\  supflieSe/iA6Aoo(9/)  ^oo(0/,A)-inffl/ee/,A6voo(e/)  ^(0/^)      ifn9n7Ap0. 

The  following  conditions  ensure  that  this  convergence  takes  place. 


Assumption  C.l  (Degenerate  Asymptotics  on  the  Interior).  When  int(0/)  is  not  empty,  so  that 
Ql  7^  dQj,  for  any  e  >  0  there  exists  large  enough  5  >  0  and  large  enough  n  such  that  for  In(S)  := 
{8j  G  int(0;)  :  infg/  egQ;  \\9x  —  9j\\  >  5/y/n},  su.pgl€jn^\  |4(0/>O)|  =  0  with  probability  no  less  than 
I  —  e.  Since  £n(9j,X)  >  0,  this  implies  that 

nqn  =  inf  4(0/,  A)  =  0  wp  — >  1. 

6Iee,,\evn(eI) 

Condition  C.l  is  motivated  by  the  fact  that  in  interval  regression  examples  for  large  n 

£n(9j,\)  =  0  wp  ->  1  for  any  fixed  9j  G  int(9/)  and  A  G  M.d. 

We  provide  examples  and  further  discussion  of  this  interesting  phenomenon  in  Section  3.3. 

Condition  C.2  puts  further  convergence  conditions  by  appropriately  matching  the  large  sample 
behavior  of  function  £n(9,X)  with  that  of  some  function  £oo(9,  A),  which  we  call  the  quasi-uniform 
limit  of  £n(9i,  A).  In  order  to  state  the  condition,  define  the  following  key  functionals 

un(X)  :=     sup    £n(8j,\)  and  ln{X)  :=     inf    4(0/,  A)  for  n  <  oo  and  n  =  oo. 

Assumption  C.2  (Quasi-Uniform  Convergence  Near  Boundary).  Let  9j  G  dQj  and  A  G  K ,  where 
K  is  any  compact  subset  ofRd.   Then 
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i-  4(^/iA)  >  0  converges  weakly  in  finite- dimensional  sense  to  some  function  £!X{9],X)  >  0, 

which  is  continuous  in  A  for  each  9]. 
ii.   (a)  ifOi  ^  d&i,  un(X)  converges  weakly  to  Uoo(A)  in  finite- dimensional  sense,  and  (b)  un(X) 
is  stochastically  e qui- continuous;  ifQj  =  90/,  then  (a)  (un(\),ln(\))  jointly  converge  weakly 
to  (u00(X),lao(X)s)  in  finite-dimensional  sense,  and  (b)  (un(X)  and  ln(\)  are  stochastically 
equi-  continuous. 
iii-  supe/€ae/iAeAoo(0,)4o(0/,A)  <oo  a.5. 

Condition  C.2  extends  the  standard  conditions  used  to  derive  the  asymptotics  of  criterion  func- 
tions in  the  point-identified  case,  where  0/  is  a  single  point  9;,  see  e.g.  Newey  and  McFadden 
(1994).  Although  asymptotics  in  the  interior  is  degenerate,  as  condition  C.l  states,  the  asymptotics 
near  the  boundary  is  assumed  to  be  non-degenerate.  C.2-(i.-ii.)  imposes  the  conditions  required 
for  obtaining  the  limit  distribution  of  coverage  statistics  and  for  verifying  A. 3.  C.2-(i)  requires  that 
£n(9,X)  converges  in  finite-dimensional  sense  to  some  limit  £00(9,  A).  C.2-(ii)  requires  that  sup  and 
inf  transformations  oi£n(9i,X)  over  99/,  denoted  un(X)  and  ln(X),  converge  in  finite-dimensional 
sense  to  respective  sup  and  inf  transformations  of  ^oo(#r>A),  denoted  as  Uoo(A)  and  ^oo(A).  It  also 
requires  that  un(X)  and  ln{X)  are  stochastically  equicontinuous.  C.2-(iii)  insures  tightness  of  the 
limit  coverage  statistic  C,  since  supe IedQI,XeAxl(e ;)  ^°o  (^/>  ^)  w^  ^>e  one  °f  ^he  components  of  C. 

Note  that  in  C.2-(ii),  it  is  possible  to  require  that 

(#/,A)  I—*  £n(9i,X)  is  stochastically  equicontinuous  over  90/  x  K, 

where  K  is  any  compact  subsets  of  IR  .  This  stronger  and  simpler  condition  certainly  implies  C.2-(ii). 
However,  this  stronger  condition  fails  to  hold  in  some  cases  of  interest  (the  regression  model  with 
interval-censored  outcome),  while  it  holds  in  others  (the  generalized  method  of  moments). 

Assumption  C.3  (Local  Quadratic  Bound).  For  any  e  >  0,  there  is  a  sufficiently  large  positive  K 
such  that  with  probability  at  least  as  large  as  1  —  e  for  large  enough  n 

en(0i,X)  >  Ci  •n-minH<9/,A),J]2 

uniformly  in  (9j,X)such  lhat  u{6i,~X)  >  K/^/n,  where  v(6r,X)  =  inf^'ge/  \\9j  +  X/y/n  —  9'j\\,  for 
some  positive  constants  C\  and  S  that  do  not  depend  on  e. 

Condition  C.3  is  needed  to  obtain  rate  of  convergence  for  set  estimates,  and  along  with  C.l  and 
C.2  allows  us  to  verify  the  main  previous  high-level  conditions  A. 2  and  A. 3.  C.3  extends,  to  the 
set-identified  case,  the  standard  conditions  needed  to  obtain  the  rate  of  convergence  of  extremum  es- 
timators in  point-identified  case;  see  conditions  in  Theorem  5.52  in  Van  der  Vaart  (1998)  and  Newey 
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and  McFadden  (1994),  p.  2185. 


Theorem  3.1  (Limits  of  Cn).  Suppose  A.l  and  C.1-C.3  hold.   Then  A. 2  holds,  in  particular, 

c    _>     c  ._  I    suPe/69e;,AeA<»(9/)   ^00(6"/,  A),  ifnqn->p0 

\  suP9,ede,,\eAco(8,)  4o(0/,  A)  —  inf0/ee/iAeV-oo(6i/)  4o(0/,A),      ifnqn7^p0 

where  nqn  — >p  0  necessarily  occurs  if  qn  :=  0  or  if  int(Oj)  is  non-empty  in  0. 


Theorem  3.1  verifies  Assumption  A. 2.  The  theorem  states  that  under  a  set  of  conditions,  which 
may  be  easily  checked  in  examples  of  interest,  the  coverage  statistic  attains  a  limit  distribution,  which 
is  determined  by  the  appropriate  inf  and  sup  transformations  of  the  quasi-uniform  limit  £oo{0j,X) 
over  dQj  and  the  local  parameter  spaces  A00(6j)  and  Ko(0r). 

In  addition  to  this  basic  result,  we  would  like  to  know  certain  properties  of  the  level  sets. 

Theorem  3.2  (Rate  of  Convergence  and  Sandwich  Property).   Suppose  A.l  and  C.l  -  C.3  hold. 
Then,  for  some  positive  constants  C  and  C , 

I.  For  any  k  — >p    00  such  that  k  =  op(n), 

67  C  Cn(k)  C  0}"  wp   -»  1,   where  en  =  C  ■  \jk/n, 

so  that 

dH{Cn{k),eI)<C-yfk/n~wp   -+1; 
II.  For  any  k  €  [co,Ci]  Inn,  the  sandwich  condition  A. 3  takes  place, 

©/  C  Cn(k)  C  07"  wp   ->  1,   where  en  —  C'  ■  lnn/y/n, 

and,  provided  b  — >  00  and  b/n  — >  0  at  polynomial  rate, 

b{  sup  Qb(8)  -  sup  Qb(6))  =  op(l). 
eee'rn  see. 

Theorem  3.2  verifies  Assumption  A. 3  for  parametric  models  and  characterizes  the  rate  of  con- 
vergence of  the  level  sets  to  the  identified  set.  The  rate  of  convergence,  as  measured  by  Hausdorff 
metric,  is  essentially  1/ \/n. 
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3.3.  Analysis  of  Regression  with  Interval-Censored  Outcomes.  First  consider  the  case  when 
Xi  =  1  and  suppose  E[Y:}  <  E[Y2).  Then  0/  =  {9  :  E[Y{\  <9<  E[Y2)},  that  is  9/  =  [E{YX},  E\Y2\]  ■ 
Then 

\2     ,     /,->         „\2 


and 


Suppose  that 


Then 


Qn(0)  =  (Yi-ey+  +  {Y2-ey_, 

en(9j,  A)  =  n  ((?!  -Bi-  X/^)\  +  (Y2  -9;-  \/Jn)2_) 
MYi  ~  EYUY2  -  EY2)' ^d   (WUW2)' ~  Af(0,Sl). 


ln(0iA)  =0  wp  -»  1  if  9t  s  (£yi,^y2), 

and  the  finite-dimensional  limit  of 

(en(EYu\),  en(EY2,X))'     is     ((^1-A)^,(^2-A)2_)'. 

Therefore  the  finite-dimensional  limit  of  £n(9,\)  is  given  by 

*oo(«>/,  A)  =  {Wi  -  X)2+l(9j  =  EYi)  +  (W2  -  \)2_l(9j  =  EY2). 

Theorem  3.3  verifies  C.1-C.3  for  this  example  so  that  £oo(Bl,X)  is  also  the  quasi-uniform  limit  of 
?n(Bi,X)  .  This  implies  by  Theorem  3.1  that 

Cn^dC=  sup  lOo(0i,\)  =  max[(W1)l,(W2)l]. 

e,sSe;,A6A00(9/) 

One  can  use  the  above  distribution  to  obtain  the  critical  values  and  corresponding  level  set  with  the 
required  coverage  property. 

Before  proceeding  to  a  more  general  case,  notice  that  the  inferential  strategy  developed  in  this 
simple  example  can  be  used  in  other  problems  where  0/  is  defined  as  the  set  of  all  9  £  0  such  that 
F\  <  9  <  F2  where  Fi  and  F2  are  functionals  of  the  distribution  of  the  observed  data.  This  is  a  set  of 
models  that  provide  interval  bounds  on  the  parameters  of  interest.  All  one  needs  to  do  in  this  class 
of  models  is  to  derive  the  joint  asymptotic  distribution  of  the  sample  analogs  of  the  endpoints  of 
the  interval  of  interest  and  obtain  via  simulation  the  critical  values  for  the  maximum  of  two  random 
variables,  as  outlined  above  for  the  case  when  F\  =  EY\  and  F2  =  EY2. 

Getting  back  to  more  general  settings,  suppose  that 

(3.9)  X  £  {xi,...,xj}    P-a.s., 
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where  the  first  component  of  X  is  1.  Condition  (3.9)  assumes  that  X  is  discrete.  The  identified  set 
0/  is  determined  by  the  set  of  inequalities: 

(3.10)  0/  :=  {6  €Rd  :  x'fi  >  n  (xj)  and  x'jO  <  t2  (xj)  for  all  j  <  J}. 

It  is  assumed  that  6/  C  0,  where  0  is  a  compact  subset  of  Rd,  and  that  the  d  x  J  matrix 

(3.11)  X  :=  (xj,j  <  J)  has  rank  d, 

which  rules  out  redundant  parameterization.  Then  the  boundary  of  the  identified  set  <90r  is 

(3.12)  a©/  =  {9 1  £  0/  :  x'jdi  =  ri  (xj)  or  x'flj  =  r2  (xj)  ,  for  some  j  <  J}, 

and  int(0/)  is  empty  in  Rd,  so  that  0/  =  dQr  if  and  only  if  t\(xj)  =  T2(xj)  for  some  j. 
Define  t\{x)  =  E{Y\\x)  and  t$(x)  =  .E^Y^x)  and  consider  the  objective  function 

3n(#)  ~  \  E  fa(*i)  -  X^l  +  (X'i°  -  ?*(*))-  . 

(3.13)  i=1 

Tk(xj)  :=     E    Yi'       fc  =  1.2,       j  =  l,.„,  J. 

i:Xi=Xj 

In  this  case  for  n3-  :=  ^  ^2?=i  Ip^i  =  £.?']> 

£n(9;,  A)  =  n£  —  ((Tlfo)  -  x^7  -  &$A/Vn)£  +  (f2(Xj)  -  x'^9,  -  x'jX/y/n)2.)  . 

3  =  1    " 

Define  W\2  :=  \/n(T\(x  j)  —  T\{x  j))  and  W2j  :=  v/n(f2 (xj )  —T2(xj))  for  j  =  1,...,  J.  Assume  also  that 
a  central  limit  theorem  and  a  law  of  large  numbers  apply  so  that 

(Wi.^i),..,^,^))'^  ((Wn,W2i),...,(W1j,W2J))'  ~  M(0,£l),  and 
rij/ri  — >p  pj     for  eachj  =  1, ...,  J. 

Theorem  3.3  (Interval  Regression).  Let  the  basic  conditions  specified  in  Section  3.2  hold  and  also 
(3.9)-  (3.14)  hold.   Then  C.1-C.3  and  A.  1- A. 3  are  satisfied.  Moreover, 
J 
too{0i,  A)  =  J2Pi  [(WU  "  x'jXftUxfa  =  nixj))  +  (W2j  -  x;.A)2_l(x^7  =  r2(x,))] , 

c=\    suPe7e3e,,AgA00(e/)^oo(^/,A)  if  nqn  ->p  0 

\  suPe;eae,,A6A00(e;)^oo(^/,A) -infe;6e/iA6Koo(6)/)   4o(0/,A)      ifnqn7^p0 

where  nqn  =  0  wp  — >  1  occurs  when  int(0/)  is  non-empty  or  if  qn  :=  0. 

This  theorem  verifies  Assumptions  C.1-C.3  and  A.1-A.3,  which  implies  that  the  results  of  The- 
orems 2.1,  2.2,  3.1,  and  3.2  apply  to  the  interval  regression  case.  Therefore  the  inference  procedure 
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proposed  in  Section  2  is  valid  in  this  case.    Further,  the  confidence  regions  have  the  stochastic 

properties  given  in  Theorems  3.1  and  3.2. 

The  distribution  of  the  limit  variable  C  is  not  pivotal,  and  has  no  known  closed  analytical  form, 

but  this  does  not  create  a  problem  for  the  inferential  method  proposed  in  Section  2.    C  can  not 

simplified  much  further,  unlike  in  the  trivial  case  with  X{  =  1.  One  can  simplify  the  leading  term  as 

J 
sup  4o(0/,  A)  =    sup    Vw  Uwisf+Uxfa  =  n (ay))  +  (W2j)2_l(x'jei  =  t2{xj))}  , 


»/t 


but  other  terms  do  not  appear  to  simplify  in  the  above  expressions. 

3.4.  Analysis  of  the  Structural  Moment  Equations  Model.  We  first  work  out  a  simple  exam- 
ple, and  then  generalize  it.  Consider  the  usual  two-stage-least-squares  model 

Yi^Xfa  +  eiiej),    Eei{ej)Zi  =  0, 

which  leads  to  the  following  objective  function 

Qn{e)  =  (±YJ{Yi-xle)z')  (^E^)"1  (if^iYi-XftZi). 

i=l  2=1  i=\ 

Assume  that  0  <  r  =  rank  EXZ'  <  dim(0),  so  that  the  identified  set  0/  consists  of  an  r-dimensional 
linear  subspace  of  Md  intersected  with  9.  0/  is  defined  as  follows:  for  any  6q  such  that  Ee(Oo)Z  =  0, 

6/  =  {6  =  0O  +  $  £  6   such  that   5' EXZ'  =  0}. 

Note  that  0/  has  empty  interior  relative  to  Rd.    Under  the  usual  circumstances,  even  in  the  case 

when  there  is  weak  identification,  Qn{0)  is  pivotal  and  can  be  inverted  for  confidence  intervals.  In 

the  partially  identified  case,  the  statistic  most  relevant  to  making  inference  on  0/  is  the  empirical 

process  {Qn{9i),9j  6  ©/)  which  is  no  longer  pivotal. 

Suppose  for  simplicity  that  qn  :=  0  (which  is  also  true  when  qn  :=  mig^QQn(9)  but  dim(.Z')  < 

dim(X)),  so  that 

i     n  /    i    n  -l  i     n 

ln(9!,\)  =  (An(9i)  +  \'-J2X*Z?)  (~EZ^0~  (A„(e/)  +  A'-X>^), 

i=\  i=l  j  =  l 

n  n 

Under  standard  sampling  conditions 

n  n 

-  ]T  ZiZ'i  ^p  EZZ',     -  J2  zixi  -+p  EZX',  and  {e(6i)Z,  9;  e  ©/}  is  Donsker. 


n  f— '  n  . 

i=i  i=i 
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Hence  the  finite-dimensional  and  quasi-uniform  limit  of  (.n{8hX)  is  given  by 

ioo(Bi,X)=  (a(6i)  +  X'EXz}'  (eZZ'Y  (a(0j)  +  X'EXZ' 

where  (A(8j),8i  6  0/)  is  the  weak  limit  of  the  empirical  process  (An(8i),8j  6  0/).  For  instance, 
under  iid  sampling,  provided  that  {e(8j)Z,8j  S  0;}  has  a  square-integrable  envelope,  the  limit  A(-) 
is  a  zero  mean  Gaussian  process  with  covariance  kernel  given  by  E A(8 '/) A{8'1)1  =  Ee(8j)e(8'I)ZZ/. 
We  therefore  conclude  that  C.1-C.3  are  satisfied  and  thus 


w/1 


-1 


Cn  =  An(8I)'(-\]ZlZ'i)     An(8,)^d  C=  sup  A{8 ,)' [E  Z  Z]~l  A{8  j) . 

Notice  that  the  limit  is  not  pivotal  and  depends  on  knowing  0/.10  Also,  compactness  of  0/  is  the 
necessary  condition  for  the  limit  variable  C  to  be  finite. 

Next,  we  generalize  our  method  to  the  general  nonlinear  method  of  moments.  Let  the  following 
partial  identification  condition  hold:  There  are  positive  constants  C  and  5  such  that 

(3  15\  \\Emi{8)\\2  >  C-min[  inf    \\8  -  8j\\,5\2,      uniformly  for  8  6  0. 

This  is  both  a  partial  identification  condition  and  a  smoothness  assumption.  This  condition  implies 
that  Errii(8)  =  0  if  an  only  if  8  6  0/,  and  also  imposes  smoothness  on  the  behavior  of  Em,i(d)  for 
points  8  near  0/. 

In  the  point-identified  case,  global  identification  and  the  full  rank  and  continuity  of  the  Jacobian 
V$Em.i(8)  near  8j  ordinarily  imply  (3.15);  see  e.g.  Theorem  3.3  in  Pakes  and  Pollard  (1998).  In  the 
set  identified  case,  the  Jacobian  may  be  degenerate,  which  necessitates  a  statement  of  a  more  careful 
condition  (3.15).  For  example,  in  the  previous  linear  IV  model  we  have  that  Em,i(8)  =  EZ'X(8  —  8J), 
where  8}  is  the  closest  point  to  8  in  0/.  Provided  that  ||0J  —  0||  >  0,  the  vector  (8  —  8j)  is  orthogonal 
to  the  hyperplane  {v  :  EZ'Xv  =  0}.  Hence  if  rank  EZ'X  is  non-zero,  for  Co  denoting  the  minimal 
positive  eigenvalue  of  (EX'Z)(EZ'X),  we  have  \\EZ'X(8  -  8*j)\\2  >  C0  ■  \\8  -  8*j\\2. 

Theorem  3.4  (Generalized  Method-of  Moments).  Let  the  basic  conditions  specified  in  Section  3.1 
hold  and  let  i.  Qj  =  dOj,  ii,  {mi(8),8  G  0}  be  a  Donsker  class,  Hi.  Em.i(8)  satisfy  (3.15)  and  have 
continuous  Jacobian  G(8)  =  VoErni(d),  v.    Wn{8)  =  W{8)  +  op(l)  uniformly  in  8,  where  W(8)  is 


In       the       case       dim(2)  >  dim(i),        and       qn  =  inf«6e  Qn(9),        then       ^oo(#/,A) 

(A(07)'  +  X'EXZ')  [EZZ'}-1  (A(9i)  +  X'EXZ') ,  and 


Cn^d   C=   sup  ^oo(0j,O)-  inf  4o(0/,A). 

«,ee,  »/€6,1»ev„(«,) 
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positive  definite  and  continuous  for  all  9.   Then,  C.1-C.3  and  A.1-A.3  hold,  and 
icoiBj,  A)  =  (A(0/)  +  A'G(0/))'  W(6j)  {A(9j)  +  \'G{9I))  , 

c  _  j   supe/6e/^oo(6'/,0)  i/ngn->pO 

\  sup0;€e;iAeAoo(e;)4o(#/,A)  -infe/€e;|AeKxj(fl/)   ^oo((9/,A)      ifnqnf+pO 

where  9j  >— >  A(9j)  is  the  weak  limit  of  the  process  9j  >— >  An(9;),  a  zero-mean  Gaussian  process 
with  the  covariance  function  E[A(9[)A(9{)'}  =  limn_oo  E{n~l  Y27=l  mi{8l)  Ya=i  rni{^i)']-  Above, 
nqn  — >p  0  necessarily  occurs  if  qn  '■=  0  or  i/qrn  :=  inf^ee  Qnifi)  but  dim(mi(#))  <  dim(f?). 

The  theorem  verifies  the  conditions  C.1-C.3  and  A.1-A.3,  which  implies  that  the  results  of  The- 
orems 2.1,  2.2,  3.1,  and  3.2  apply  to  the  GMM  case.  Therefore  the  inference  procedure  proposed  in 
Section  2  is  valid  in  this  case.  Further,  the  confidence  regions  have  the  stochastic  properties  given 
in  Theorems  3.1  and  3.2.  Similarly  to  the  interval  regression  case,  C  depends  on  0/.  Hence  the 
distribution  of  the  limit  variable  C  is  not  pivotal,  and  has  no  known  closed  analytical  form,  but  this 
does  not  create  a  problem  for  the  inferential  method  proposed  in  Section  2. 

4.  Computation  and  Empirical  Monte  Carlo 

We  illustrate  our  methods  above  with  an  empirical  Monte  Carlo  that  uses  data  from  the  CPS. 
We  also  provide  details  on  the  computational  method  that  can  be  used  to  construct  the  confidence 
sets. 

4.1.  Computation.  An  issue  in  the  subsampling  method  is  being  able  to  use  an  efficient  numerical 
algorithm  to  construct  level  sets  of  an  objective  function.  Ideally,  one  would  like  to  use  a  rich  set 
of  grid  points  and  evaluate  the  function  on  those  points.  However,  as  the  dimension  of  9  increases, 
constructing  a  simple  uniform  grid  becomes  computationally  infeasible.  The  Metropolis-Hastings 
algorithm  provides  a  computationally  attractive  method  for  generating  adaptive  grid  sets.  The 
details  of  the  numerical  approach  we  use  is  summarized  in  the  following  algorithm: 

1.  Generate  a  grid  of  points  B  =  (9%,  ...,9k}  using  the  Metropolis-Hastings   algorithm.11 

2.  Given  a  starting  critical  value   c0   and  k=  c0lnn,    we   can  compute  the  level  set   of  the 
objective  function  as:      Cn(k)  =  {9g  €  0  :  n(Qn((9g)  -  mine  sc„(k)  Qn(#g))  <  k}. 

3.  At   each  subsampling  stage   j   where   j  =  l,...,Bn,    compute  Qb(#g)  for  all  9&  £Cn(k),    and  the 
subsample   coverage   statistic,    e.g.      Cj|b,n  =  b(maxeggCii(k)  Qt,(0g)  -  min8Bscn(k)  Qt>(#g))  ■ 

4.  Compute  the  Q-quantile  5(a)   of   {Cjib,n,  j  =  1,  .--iBn}  . 

5 .  Then  compute  Cn(6(a))  =  {6>g  6  §  :  n(qn(0g)  -  min9g6c„(k)  Qb(<?g))  <  c(a)}. 


This  algorithm  is  a  valuable  method  of  generating  grid  points  adaptively,  so  that  they  are  placed  in  relevant 
regions  only.  See  Robert  and  Casella  (1998)  for  a  detailed  description;  and  Chernozhukov  and  Hong  (2003)  for  related 
examples  in  non-likelihood  settings. 
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An  important  practical  issue  is  the  choice  of  the  initial  point  cq.  Theorem  2.2  suggests  the 
consistency  of  estimate  ca  is  not  affected  by  the  choice  of  cq  as  long  as  cq  is  bounded  in  the  sense  of 
(2.2).  Hence  a  reasonable  procedure  for  selecting  cq  can  be  based  on  pointwise  testing  procedures. 
For  example,  in  GMM  a  pointwise  critical  value  for  testing  Q{9)  =  0  at  6  is  given  by  the  a-quantile 
of  a  chi-squared  variable  with  degrees  of  freedom  equal  to  the  number  of  equations.  See  the  next 
section  for  how  to  choose  cq  in  the  interval  regression  case.  More  generally,  one  can  use  the  a-quantile 
of  the  asymptotic  distribution  of  the  coverage  statistic  computed  under  the  assumption  that  8j  is  a 
singleton. 

4.2.  Empirical  Monte  Carlo:  Returns  to  Schooling  in  the  CPS.  We  examine  the  actual  finite- 
sample  performance  of  the  inferential  procedures  proposed  in  this  paper  using  data  from  Current 
Population  Survey.  Our  "population"  is  a  sample  of  white  men  ages  20  to  50  from  the  March  2000 
wave  of  the  CPS.  The  wages  and  salaries  series  are  not  top  coded  or  censored  and  so  we  are  able 
to  use  this  population  to  construct 

1.  confidence  regions  for  the  returns  to  schooling  parameters  in  the  point  identified  case,  and 

2.  confidence  regions  covering  the  identified  set  when  we  artificially  bracket  the  data. 

Our  Monte  Carlo  is  based  on  random  sampling  from  the  original  data  set.  Table  1  below  provides 
summary  statistics  for  our  data  (population).    The  "true"  returns  to  schooling  coefficient  is  .0533 

TABLE  1.  Population  Summary  Statistics 


Variable 

Obs 

Mean 

Std  Dev 

Min 

Max 

Wages  and  Salaries 

13290 

66667.6 

51968.41 

1 

513472 

Education 

13290 

11.77 

1.89 

1 
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with  a  constant  term  of  3.91  obtained  from  a  least  squares  regression  of  log  of  wages  on  education 
in  the  "population" .  We  start  first  by  describing  the  details  of  the  computational  procedure. 

4.3.  Starting  values  and  Implementation  Details:  The  algorithm  used  to  obtain  estimates  of 
9/  is  the  same  as  the  one  described  on  page  20  above.  The  steps  of  the  procedure  are  as  follows: 

1.  We  draw  a  sample  of  size  n  from  the  above  population  (this  population  will  be  artificially 
bracketed  below). 

2.  We  build  an  initial  estimate  C„(co)  at  the  starting  cutoff  level  cq  (choosing  cq  is  described 
next). 

3.  In  the  subsampling  step,  we  obtain  the  consistent  estimate,  ca  of  the  cutoff  level  by  subsam- 
pling  the  coverage  statistic. 

20 


At  this  point  one  can  go  back  to  step  2  above  and  set  en  =  ca  Inn  and  then  repeat  step  3.12 
One  may  iterate  several  times,  but  we  find  that  the  cutoff  levels  that  we  get  are  very  close 
after  no  or  at  most  two  iterations. 
4.  In  all  of  the  above  steps  we  use  the  numerical  procedures  as  described  in  Section  4.1. 

The  starting  value  en  that  is  used  to  construct  the  initial  level  set  of  the  objective  is  problem  specific. 
For  example,  in  the  method  of  moments  case,  the  critical  value  from  a  \2  with  an  appropriate  degrees 
of  freedom  can  be  used  as  the  starting  value.  In  the  interval  regression  example,  we  choose  an  initial 
value  co  is  the  following  way.  Let  Y?  =  \{Yu  +  Yu)  and  Yu  =  Yii  =  Yf,  then  we  use  the  objective 
function  Qn{6)  applied  to  the  data  {Yu,Y2i,Xi,i  <  n).  This  generates  an  auxiliary  model,  which  is 
point-identified  at  some  value  6q,  and  for  which  we  can  compute  the  limit  distribution  of 

n(Qn(OaO)-Qn(0aO)), 

and  take  its  quantiles  as  the  starting  value  cq.  In  what  follows,  we  used  subsampling  to  estimate  Co, 
which  is  a  consistent  method  of  estimating  cq  by  Theorem  2.6.1  in  Politis,  Romano,  and  Wolf  (1999). 

4.4.  Properties  of  the  Set  Inference  Procedure  in  the  Point-identified  model.  Here,  the 
data  on  wages  is  not  interval  measured  (the  model  is  point  identified),  but  we  nonetheless  apply 
our  procedure  to  obtain  the  confidence  region.  The  objective  function  that  we  use  is  the  minimum 
distance  objective  function 

J=16 

(4.1)  Qn(9)  =  J2  ^(flfo)  -  x'jO?-  +  (h(xj)  -  x'3e)l. 

Since  wages  are  not  interval-measured,  we  have  that  f\{xj)  —  ?2(xj),j  =  I,...,  J.  The  results 
obtained  are  compared  to  the  the  joint  confidence  ellipse  obtained  using  the  usual  Wald  inference 
approach.  In  Figure  1,  we  provide  graphs  for  n  =  400  and  n  =  2000  observations  and  see  that  the 
subsampling  procedure  is  close  to  the  ellipse  obtained  using  the  usual  chi-squared  approximation. 
In  Table  2,  we  examine  the  coverage  of  our  inferential  procedure  .  We  provide  the  coverage  for  two 
sample  sizes:  n  =  1000  and  n  =  2000,  using  R  =  600  simulations.  We  report  the  coverage  for  a 
sequence  of  subsample  sizes.  As  we  can  see,  coverage  seems  monotonic  initially  in  the  subsample  size 
and  for  the  case  where  n  =  1000,  it  peaks  at  b  =  200  while  for  the  case  where  n  =  2000,  it  peaks  at 
b  =  500  and  comes  close  to  95%. 

4.5.  Properties  of  the  Set  Inference  Procedure  in  the  Set-Identified  Model.  To  examine 
the  identified  set  of  parameters  in  the  case  of  censoring  of  the  dependent  variable,  we  bracket  the 
income  data  into  15  different  categories.  These  brackets  are  (in  thousands) 


In  the  interval  regression  example  and  similar  situations,  Inn  may  be  dropped. 
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Figure  1.  Point  Identified  Case:  Subsampling  vs  x2  Ellipses 


3    0.06 
UJ 


Subsampling  Ellipse 


Chi  Squared  Ellipse 


Constant 


Table  2.  Finite-Sample  Coverage  Property  in  the  Point  Identified  Model 


Subsample  Size 

n=1000 

b=50 

b=80 

b=120 

b=200 

b=300 

Coverage  f95%,) 

85.1 

88 

87.2 

93.1 

91.2 

n=2000 

b=200 

b=300 

b=400 

b=500 

b=600 

Coverage  (Qh%) 

86.1 

88.2 

90.5 

95.01 

93.3 

[0,5] , [5,7.5] , [7.5,10] , [10,12,5] , [12.5,15] , [15,20] , [20,25] , [25,30] 
[30,35] ,  [35,40] , [40,50] , [50,60] , [60,75] , [75,100] , [100,150] , [150,100000] 

Notice  here  that  the  topcoding  is  artificially  set  at  $100  million.  To  give  a  flavor  of  the  data,  we 
obtain  718  observations  in  the  first  bracket,  1211  belong  to  [50,60],  1598  belong  to  [100,150]  while 
734  are  making  above  150.  The  objective  function  is  the  same  as  the  one  used  earlier.  In  Figure  2 
through  our  95%  confidence  region  Cn{c$$)  for  sample  sizes  n  =  600,2000,4000,  and  10000  drawn 
at  random  without  replacement  from  the  original  "population" . 

Using  the  artificially  bracketed  population,  we  can  obtain  the  identified  set  by  collecting  the 
parameters  that  satisfy  the  following  set  of  J  inequalities  corresponding  to  the  set  of  J  values  that 
the  level  of  education  takes: 


E[y\\xj]  <  b0  +  bixj  <  E[y2\xj],     j  =  1, ...,  J. 
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We  then  graph  the  identified  set  0/  along  with  the  95%  confidence  region  Cn(c.9$).  Notice  that 
as  the  sample  size  increases,  the  set  estimates  shrink  towards  the  identified  set.  For  example,  at 
n  =  600,  our  set  estimate  of  the  intercept  is  [3.2,4.71]  while  the  true  range  is  [3.6,4.4]. 


Figure 
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To  calculate  coverage  probabilities,  we  draw  R  =  600  random  samples  from  the  population  and 
record  whether  our  set  estimates  using  these  samples  cover  the  identified  set.  For  every  sample,  we 
construct  the  set  estimate,  and  calculate  coverage  in  the  following  manner.  Numerically,  we  store  the 
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Table  3.  Finite-Sample  Coverage  Property  in  the  Set  Identified  Model 


Subsample  Size 

N=1000 

b=100 

b=150 

b=200 

b=250 

b=300 

Coverage  f95%,) 

84.1 

90.2 

91.3 

88.5 

82.8 

N=2000 

b=200 

b=300 

b=400 

b=500 

b=600 

Coverage  ^95%J 

86.34 

90.1 

92.2 

94.3 

91.2 

identified  set  6/  and  Cn(ca)  as  arrays,  and  then  check  whether  all  the  points  in  0/  are  contained  in 
Cn(ca).  As  Table  3  and  Figures  2-5  show,  the  coverage  is  similar  to  the  point  identified  case  and  the 
set  estimates  converge  to  the  identified  set  as  the  samples  size  increases.  The  numerical  performance 
of  our  inferential  methods  supports  the  large  sample  theory  developed  in  Theorems  3.1-3.3. 

5.  Conclusion 

This  paper  provided  confidence  regions  for  identified  sets  in  models  with  partial  identification. 
The  proposed  inference  procedures  are  criterion  function  based,  and  our  confidence  regions  are  certain 
level  sets  of  the  criterion  function  in  finite  samples.  In  the  case  when  the  model  is  point-identified, 
our  confidence  sets  reduce  to  the  conventional  confidence  regions  based  on  inverting  the  likelihood 
or  other  criterion  functions.  The  proposed  procedure  was  shown  to  be  valid  under  general  yet 
simple  conditions.  Along  with  inferential  procedures,  we  have  developed  methods  of  analyzing  the 
asymptotic  behavior  of  econometric  criterion  functions  under  set  identification  and  also  characterized 
the  rates  of  convergence  of  the  confidence  regions  to  the  identified  set.  We  applied  our  methods  to 
regressions  with  interval  data  and  set  identified  method  of  moments  problems.  We  also  assessed  the 
performance  of  the  methods  in  Monte  Carlo  experiments  based  on  the  Current  Population  Survey 
data.  We  found  that  the  methods  perform  well  and  in  accordance  with  the  asymptotic  theory 
developed. 
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Appendix  A.  Appendix 
We  use  the  following  notation  for  empirical  processes  in  the  sequel:  for  W  =  (Y,  X) 

Enf(W)  =  -J2  /(Wi),    Gnf(W)  =  -L  E  (/W)  "  E/(^))  ■ 

£  denotes  expectation  and  E  denotes  expectation  evaluated  at  estimated  functions  /: 

E/(Wi)  =  (£/(Wi))/=/. 

Outer  and  inner  probabilities,  P'  and  P»  and  corresponding  notions  of  weak  converegence  are  defined  as  in 
van  der  Vaart  and  Wellner  (1996).  — >p  denotes  convergence  in  outer  probability,  and  — *d  means  convergence 
in  distribution  under  P* ,  wp  — *  1  means  "with  the  inner  probability  approaching  1,"  and  "wp  >  1  —  e"  means 
"with  the  inner  probability  no  smaller  than  1  —  e  for  sufficiently  large  n" .  A  table  of  notation  is  given  at  the 
end  of  the  Appendix. 

Appendix  B.  Proofs  of  Lemma  2.1,  Theorem  2.1,  and  Theorem  2.2. 
B.l.   Proof  of  Lemma  2.1.  Proof  is  given  in  the  main  text.  D 

B.2.  Proof  of  Theorem  2.1.  Proof  is  given  in  the  main  text.  □ 

B.3.  Proof  of  Theorem  2.2.  Step  I.  We  have  that 

(B.l)  Cj,b,n=     sup    ab(Qjtb(0)-qjtb), 

eec„(c) 

where  qhb  :=  0  if  qn  :=  0  and  q^b  :  =  infgge  Qj,b{9)  otherwise;  Qhb{9)  denotes  the  criterion  function  defined 
using  the  j-th  subset  of  the  data  only.  Define 

s„  b„ 

(B.2)  Gb,„(X)  :=  B~l  J2  l{Cj.fr.n   <  X}  =  -S"1  J2  HCjAn   <X-  {Chb,n  -  CjAn)}, 

3=1  3=1 

where 

(B.3)  Cjtb,n  =  sup  ab  {Qj,b{9)  -  q},b)  ■ 

eee, 

In  what  follows,  the  main  step  is  to  show  that  Cj,b,n  can  be  replaced  by  Cj:b,n-  This  will  be  possible  due  to 
the  sandwich  assumption,  and  despite  that  the  constant  k  used  in  construction  of  the  preliminary  set  estimate 
Cn(k)  is  data-dependent. 

By  A. 3  wp  — >  1,  for  some  deterministic  set  0^"  we  have 

(B.4)  e,cc„(i)ce;", 

where  6£7"  :=  {9,  +  t  :  ||i||  <  e„,0/  £  0/}.  Hence  wp  ->  1  for  all  i  <  Bn 

(B.5)                      sup  ab  (Q],b{9)  -  q3,b)  <     sup     ab(Qj,b{9)  -  qj,b)  <    sup   ab  (Qj,b{9)  -  qj,b). 
see,                                    «ec„(c)                                    eee'" 
'      > - ' — „ ' 

J,6,rl  C ,  b  n  C     b  n 
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Hence  wp  — >  1 

(B.6)  Gbtn(x)  =  B~l  Y,  1{^,6,»  <  *}  <  &,»(*)  <  Gb,n(x)  =  B'1  £  l{Ci|6,n  <  x}. 

3=1  3  =  1 

By  A.2  C6  ->d  C  and  by  A. 3 

(B.7)  Cb  =  Cb  +  op{l)  -d  C, 

so  that  by  Step  II 

(B.8)  Gbtn(x)-*p  G{x)~P{C<x},    Gb,n{x)-*P   G(x)  =  P{C<x}, 

if  a;  is  a  continuity  point  of  G(x),  which  proves  that: 

(B.9)  Gb,n(x)^p   G(x) 

at  the  continuity  points  x  of  G(x).  Furthermore,  if  G  is  continuous  at  ca  =  G~1{a), 

(B.10)  ca  =  G-\{a)  -*p   ca  =  G-'(q), 

since  convergence  of  distribution  function  at  continuity  points  implies  the  convergence  of  quantile  functions 
at  continuity  points.  Finally  the  claim  I,  that 

(B.ll)  p(e,ec„(y)^a, 

follows  by  Theorem  2.1. 

Step  II.  This  part  shows  (B.8).  Write 

Bn 

Gitn(x)  =  B^1'£l{CjAn<x} 

(B.i2)  (0)  r 

=  P{Cb  <  x}  +  op(l) 
(=>  P{C   <x}  +  op(l) 
at  the  continuity  points  x  of  G(x)  =  P{C   <  x},  as  long  as 

(B.13)  ►  0,     6  — >  co,    n  — *  oo; 

n 

where  (a)  follows  from 

(B.14)  Varl^fSfo,^*})  =o(l) 

by  the  variance  bound  for  bounded  17-statistics  for  i.i.d.  series  and  a-mixing  series  given  on  pages  45  and  72 
in  Politis,  Romano,  and  Wolf  (1999);  and  (b)  follows  by 

(B.15)  Cb  ->d  C,  as  6->  oo, 

and  the  definition  of  convergence  in  distribution  on  K. 
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Likewise,  conclude 

Gb,n(x)  =  B-1Y^HCJ,b,n<x} 

(B.16)  J=1 

=  P{Cb  <  x}  +  op(l) 

=  P{C   <x}  +  op{l) 

at  the  continuity  points  x  of  G(x)  =  P{C   <  z}. 

Step  III.  Wp  ->  1, 

(B.17)  e,cCn('k)   =>  /i(0/,Cn(fc))  =  O. 

Then,  wp  — >  1 

(B.18)  C„(i)ce'"    =>   /i(Cn(/i),G/)</i(0E-,0/)<£71. 

Hence  wp  — >  1 

(B.19)  dH(6/,C*n(fc))<en. 

Thus  Claim  II  is  proven.  □ 


Appendix  C.  Proof  of  Theorem  3.1 


First  recall  that 


Cn    :=   sup  n{Qn{6i)  -  qn) 
e,se, 

(C,1)  =  f      suPe/£e,  4(0/, 0)  if      g„:=0, 

\      supe;e6;  4(0/,O)-infS;+Vv^ee  4(0/,A)      if      gn  :=  inffl6e  <2n(0)- 

We  seek  to  establish  that  Cn  — >j   C,  where 
/C2n  c.=  f  suPe/ese/.AeA^ts,)  ^oo(^/,A),  if  nqn  -^p  0 

1  suPe/ese^AeA^o,).  4c (0/,  A)  -  inf^ge^sv^ta,)  4o(0/,A)      if  ngn /*p  0, 

(C.3)  A00(6>/)  :=  {0}U  {A  eMd  :  61/ +  A/ v/nG  int(07)    for  all  sufficiently  large   n}, 

(C.4)  14,(0/)  :=  {A  G  Kd  :  0/  +  A/Vn  G  0  for  all  sufficiently  large  n}. 

Step  I:  Case  when  nqn  —*p  0.  Since  nqn  — >p  0,  we  have  that 

(C.5)  Cn=   sup  4(0/,O)  +  op(l). 

a;se, 

Hence  in  what  follows  we  ignore  the  op(l)  term. 

Then  we  would  like  to  show  the  following  simple,  but  important  representation: 

(C.6)  Cn=    sup  4(0/, 0)=  sup  4(0/, A) 

e,ee,  e,ede,,\eA„(8r) 

where 

(C.7)  An(0/)  :=  {0}  U  {A  G  Rd  :  0/  +  A/v^7  G  int(©/)  for  all  ri  G  [n,  oo)}. 

To  prove  (C.6),  we  need  to  show  that  for  each  0  G  0/,  there  exists  6]  G  90/  and  A*  G  An(0J)  such  that 
0}  +  X*/^/n  =  0/.  If  0/  G  90/,  then  simply  take  6]  =  6j  and  A*  =  0.  If  0/  G  int(0/)  take  a  sufficiently  small 
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ball  centered  at  #/  of  radius  6  >  0  ,  denoted  Es(8j),  such  that  Bs(di)  C  int(©/).  Then  start  expanding  the 
radius  of  the  ball  <5  so  that  to  find  the  minimal  value  of  S  such  that  the  boundary  of  the  ball  includes  a  point 
9*j  6  90/.  Such  radius  exists  by  compactness  of  6/.  Then  the  points  on  the  line  segment  between  9}  and 
6j  all  belong  to  int(0/)  and  can  be  parameterized  as  9}  +  \*/y/n/,  n'  g  [n,oo),  where  A*  =  (9i  —  9})y/n.  By 
construction  9}  €  90/  and  A*  e  An(9}).  Hence  (C.6)  is  true. 
In  what  follows,  we  distinguish  two  cases: 

Case  1.  0/  has  an  empty  interior  relative  to  0,  so  that  0/  =  90/, 
Case  2.  0;  has  a  nonempty  interior  relative  to  0,  so  that  0/  /  90/. 

Case  1.  By  C.2-(i) 

(C.8)  Cn=    sup  4(6>/, 0)=    sup    4(0/,O)->d   C=    sup    4=(0J,O). 

e,ee,  e,^ae,  e;eae. 

Since  int(0/)  is  empty, 

(C.9)  An(e/)  =  Aoo(5/)  =  {0}, 

so  that  we  can  also  rewrite  (C.8)  using  general  notation 

(CIO)  Cn=   sup  4(07,0)  =  sup  4(07,A)->d  C=  sup  4o(0/,A). 

«iee,  e,sse,,AeAn(fl;)  0jeee,,AeA,„(e,) 

Case  2.  For  any  5  >  0  decompose 
(C.ll)  Cn  =  max[C;(<5),C„(5)], 

where 

fC  12)  C;(<5):=     sup     4  (#7,0)    and   Cn(5)  :=        sup       4(07,0), 

v    '     ;  e,£i„(6)  e,ee,\in(6) 

and  In(5)  =  {#/  e  int(0/)  :  infygae,  ||#/  -  0/||  >  5/y/n},  as  defined  in  Assumption  C.l. 

By  Assumption  C.l  for  any  e  >  0  there  exists  S  sufficiently  large  such  that 

(C.13)  liminfP»(c;(<5)  =  o)  >  1  -  e. 

Observe  also  that  Cn(5)  >  0  by  construction.  Hence  for  any  e  >  0  there  exists  5  sufficiently  large  such  that 

(C.14)  liminfP„(<4  =Cn(5)\  >  1  -  e. 

n—*oo  I  J 

Observe  that 

(C15)  Cn{5)=  sup  4  (9j,  A). 

»/68e;,A6A„(8,)n{||A||<i} 

By  Assumption  C. 2  (i.-ii.)  for  any  compact  set  K 

(C16)  SUP    *n(0i,-)=>    sup    4o(6>/,-)mi°°(^), 

v         '  e,eae,  e,eae, 

where  A  >— >  supS;gae;  4o($7,  A)  has  uniformly  continuous  paths.    The  next  claim  is  that  (C.16)  implies  by 
Continuous  Mapping  Theorem  that 

(C  17)  Cn{5)^d   C{5)  :=  sup  4o(#7,A). 

*■    '     '  e;6Se,,AeAoo(fl,)n{||A||<«} 

The  proof  of  this  claim  is  given  below.  Hence  for  any  closed  set  F 
(C.18)  limsupP*{Cn(<5)  e  F}  <  P{C(6)  e  F}. 

n—*oo 
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Hence  by  (C.13)  and  (C.14)  for  any  e  >  0  there  exists  <5£  large  enough  such  that 
(C.19)  limsupP*{Cn  £F}<  P{C{5€)  €  F}  +  e. 

n—*oo 

Therefore,  taking  e  — >  0  and  <5t  — »  oo  accordingly,  it  follows  that 
(C.20)  limsuPP*{Cn  €  F}  <  P{C  6  F}, 

n— »oo 

where 

/n  o~\)  C  :=  lim  C  (5)  =  sup  ^<x(di,X)  a.s. 

^        ;  s^°°  e,eae,,  A€A„(9,) 

The  limit  C  exists  in  R  by  the  Monotone  Convergence  Theorem.  By  construction  C  >  0  a.s.,  and  by  Assumption 
C.3(iii)  C  <  oo  a.s.  Conclude  that  by  the  Portmanteau  lemma  (van  der  Vaart  and  Wellner  (1996),  p. 20)  and 
(C.20)  that 

(C.22)  Cn  ^d  C. 

Proof  of  Cn(5)  ->d   C(5).  Observe  that  ' 

(C.23)  An„(0/)  C  An(6i)  C  A„(fi7),  for  all  n  >  n0,  all  n0  >  1 

and  An(9[)  /  Aoo(0/),  in  the  sense  that  An(0/)  is  a  nested  sequence  and  An(0j)  — »  Aoa(8I),  i.e. 

(C24)  U~=1  D~  no  An(9j)  =  n»0=1  U~  „„  An(0,)  =  A_(0,), 

since  by  definitions  given  earlier  for  all  n0  >  1 

(C25)  U~  noAn(07)  =  A.(0,)  and  n»  no  A„(fl,)  =  A„o(0,). 


Therefore, 


C;,(5):=  sup  4(fl7l  A) 

fl/6se;,AsA„0(e,)n{||A||<5} 


(C  26)  ^  C"(5)  =  SUP  ^  (^' A) 

v    '     '  e/ese,,AeA„(e,)n{||A||<cS} 


and 


(C.27) 


<Cn(S):=  sup  <„(»/,  A). 

e,sse,,AeA«,(e,)n{||A||<5} 


£(<*):  =  sup  *»  (*/.*) 

S/68e,,AeA„0(e,)n{||A||<i} 

<C(<5):=  sup  4c(0/,A). 

e,eae,,\eAa,(6,)n{\\x\\<s) 


By  (C.16)  and  the  Continuous  Mapping  Theorem 
(C.28)  rn{6)^dC{5) 

and  for  any  fixed  n0 
(C.29)  Cn(S)  -d  C(6) 

Let  the  underlying  probability  space  be  denoted  as  (fl,J-,  P).  Observe  that  for  every  u>  £  fi, 

(C.30)  sup  4o(0/,A) 

9;eae,,A6A«,(e,)n{||A||<15} 
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is  necessarily  attained  at  some  9*,(u>)  €  90/  and  A*(o;)  €  A00(8*i(lo))  n  {||A||  <  <$},  because  the  set  90;  is 
compact  and  A„(8}(w))  [~l  {||A||  <  5}  is  also  compact.  That  is 

(C.31)  C(5)  =  i0O(6*(u),\t{w))a..s. 

Moreover  as  no  — »  oo, 

(C.32)  Ano(0/)  n  {||A||  <5)  /  A„{6,)  n  {||A||  <  6}  for  each  9,  €  90, 

hence 

(C.33)  An„  WH)  n  {||A||  <  5}  /  A»(»;(w))  n  {||A||  <  5}  a.s. 

so  that  by  continuity  of  A  i— ►  ix{6i,  A),  we  have  that 

(C.34)  C(5)  /    lim   C(5)  =  C{5)    a.s. 

no— »oo 

Note  that  to  obtain  (C.34)  we  use  that  C(5)  is  monotonicity  increasing  in  no  due  to  (C.32),  so  that  limno_00  C(5) 
exists  a.s.  and  by  (C.27) 

(C.35)  lim   C(S)  <  C{5)  a.s. 

TLq— »CX3 

Moreover,  for  9](u)  defined  above  there  exists  a  sequence  A*o(u;)  in  Ano(0J(u;))ri{||A||  <  5}  such  that  A^o(u;)  — ♦ 
A*(oj)  a.s.  Hence  by  continuity  of  A  i— »  ^(^z,  A)  at  each  9j  €  90/, 

(C.36)  lim   C(5)>    lim  eao{6*I{uj),\no(u}))=eoo(9,I{u),Xt{u))=C{6)a..s. 

no— *oo  no— *oo 

Now  we  are  ready  to  close  the  argument.  Let  F  be  any  real  number  such  that  P{C(S)  =  F}  =  0,  then 

P{C(6)  <  F}  =    lim  P*{Cn(5)  <  F} 

n—*oo 
(2) 

<  liminfP,{C„((J)  <F} 

n—>oo 
(3) 

<  limsupP*{Cn(5)  <  F} 
(C.37)  n"°° 

(4) 

<  lim  sup  P*{Cn  {5)  <  F) 

n— *oo 
(5) 

<  P{C{5)  <  F)  for  any  n0  >  1 
^     P{C{5)<F}, 

no— *oo 

where  (1)  is  by  (C.28)  and  the  Portmanteau  lemma  (van  der  Vaart  and  Wellner  (1996),  p. 20),  (2)-(4)  by 
(C.26),  (5)  by  the  Portmanteau  lemma,  and  (6)  is  by  (C.34)  and  the  Portmanteau  lemma.  Thus  for  any  F 
such  that  P{C(5)  =  F}  =  0, 

(C.38)  lim  inf  P.  {Cn  {5)  <  F}  =  lim  sup  P*  {Cn  (S)  <  F}  =  P{C(5)  <  F} . 

n^°°  n— oo 

Thus,  by  the  Portmanteau  lemma 

(C.39)  Cn(5)  -d  C{5). 

Step  II. Case  when  ©/  =  90/  and  nqn  -f*p  0:  Consider  two  cases: 

Case  1.  0/  has  a  nonempty  interior  relative  to  0,  so  that  0/  /  90/, 

30 


Case  2.  6/  has  an  empty  interior  relative  to  0,  so  that  0/  =  90/. 
Case  1.  In  this  case  by  Assumption  C.l 
(C.40)  nqn  ->„    0, 

which  Step  I  has  taken  care  of. 

Case  2.  We  have  by  definition  of  £n(8i,  A)  and  Cn, 
fC4H  C„=   sup  MM-        inf       4(0/, A). 

We  need  to  show  that 

(C.42)     (sup4(0/,O),        inf        in{0,,X) J  ->d  (  sup  4o(0/,A),  inf  4o  (0/,  A) 

\8,ee,  8,+A/vATse  /  \0,eae,AeA«,(e,)  S/eae/.Aev,,.^,) 

Define 

(C.43)  Mn  :=         inf        4(0/,  A)      and   M  :=  inf  4>(0/,A). 

«;+A/v/See  S/gae/.AeVooffl,) 

Below  we  show  only  the  marginal  convergence  Mn  — »d  M.  The  joint  convergence  (C.42)  follows  by  combining 

the  arguments  of  the  proofs  of  Step  I  and  Step  II  and  applying  the  Cramer- Wold  Device. 

Write 

(C.44)  Mn  =  mm[Mn  (K) ,  Mn{K)], 

where 

Mn(K)~  inf  4  (eIt  A), 

e,+x/^ee,v(B,,x)>K/y/K 

Mn(K):=  inf  4(0/,  A), 

8,+A/v/Sge,     v{e,,\)<K/^ 

where 

(C.46)  1/(0/,  A)  :=    inf    ||0j  +  A/Vn  -  9'j\\. 

By  Assumption  C.3,  for  any  e  >  0,  there  is  K  large  enough  so  that 
(C.47)  \imini Pt{Mn(K)  >  Cx  mm\K2,n62}}  >  1  -  e, 

for  some  constants  C\  >  0  and  5  >  0  that  do  not  depend  on  e. 
First  observe  that 

Mn(K)=  inf  4(0/,  A) 

9;+A/v^ee,      v(8 ,  ,\)<K  /  sfH 

(C.48)  =  'nf  4(0/,  A) 

v        ;  e,+A/v^ee,     ||a||<k 

=  inf  4(0/,  A), 

0;eSe,,A£Vn(9,)n{||A||</<'} 

where 

(C.49)  Vn{0i)  :=  {XeRd  :9i  +  A/Vn7  €  0,  for  all  n'  e  [n,oo)} 

Equality  (1)  in  (C.48)  is  trivial.  First,  by  compactness  of  0/  every  6i  +  X/^/n  such  that  u(6j,  A)  <  K/^/n 
can  be  represented  as  6]  +  A* /\/n  where  A*  =  \/ni/(0/,A)  <  K.  Second,  every  9j  +  X/^/n  with  ||A||  <  K 
trivially  satisfies  y/nu(9i,  A)  <  K . 
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Equality  (2)  in  (C.48)  is  more  subtle.  To  show  (2)  note  that  {8,  +  X/y/n,  |A|  <  K,9j  e  G;}  nS  = 
{(#/  +  BK/^(0))  nO,9j  £  ©/}.  Note  that  (9i  +  BK/^{Q))  D  G  is  convex  and  necessarily  contains  8i,  since 
it  is  the  intersection  of  two  convex  sets  (by  Assumption  A.l  6  is  convex)  that  contain  9/.  Hence  for  every 
6'  =  9 1  +  X/^/n  6  (9 1  +  BK/^(Q))  n  G,  points  on  the  linear  segment  between  9'  and  6j  are  also  contained  in 
{9,  +  BK/^(Q))  n  G.  Therefore  A  =  ^/n{9'  -  9r)  e  Vn{6,)  n  {||A||  <  K},  and  representation  (2)  follows. 

By  Assumption  C.2  (i.-ii.) 

(C.50)  inf   in(eIr)=>    inf   t^Oj,  ■)  in  L°°(K). 

Equipped  with  (C.48)  and  (C.50),  we  can  show  through  the  use  of  the  Continuous  Mapping  Theorem  that 
(C.51)  Mn(K)^d  M(K):=  inf  4,(0;,  A). 

The  proof  of  this  claim  is  stated  below. 

Hence  by  (C.47)-(C51),  for  any  e  >  0  there  is  a  sufficiently  large  K(e)  such  that 

(C.52)  lim  inf  P.  { Mn  =  Mn  (K  (e))  }  >  1  -  e. 

Note  that 
(C  53)  M  =  inf  lx  (0,,  A)  =  lim  M  (K) . 

M  exists  a.s.  in  R  by  the  Monotone  Convergence  Theorem.  Since  4o  {9j,  A)  >  0  and  finite  at  least  for  A  =  0 
by  C.2,  0  <  M  <  oo  a.s.,  i.e  M  is  tight. 

By  (C.51)-(C,52),  for  any  e  >  0  and  each  closed  set  F, 

lim  sup  Pf  {Mn  6  F}  <  lim  sup  P*  {Mn  (K  (e))  e  F}  +  e 
(C.54)  "^°°  n^°° 

<P{M{K{e))€F}  +  e. 

Letting  e  — >  0  and  K(e)  — >  oo  accordingly,  it  follows  that 

(C.55)  limsup P*{A4n  GF}  <P{M  eF}. 

n—KX> 

By  the  Portmanteau  Lemma  conclude  that 

(C56)  Mn  ^d   M. 

Proof  otMn(K)-*d   M{K).  Observe  that 
(C57)  Vno(9i)  C  Vn{9,)  C  ^,(0;),  for  all  n  >  n0 

so  that 

A4„(iO:=  inf  tn{fii,X) 

(C5&)  <Mn{K)=  inf  4(19;,  A) 

<TOff):=  inf  4  (5;,  A). 

9/e3e,,Asv„0(sf)n{||>,||<if} 

By  (C.50)  and  the  Continuous  Mapping  Theorem  for  any  fixed  n0 

(C59)  jA~n{K)  ^dM(K)  :=  inf  <«,  (*/,*)", 

fl;eae,,AevnD(e,)n{||A||<K} 
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and 

(C.60)                                 Mn(K)^dM  {K):=  inf  *M  (07l  A). 
«;eae/,A6K»(e,)n{||A||<K-} 

Moreover,  as  no  — >  oo 

(C.61)  Vno(0/)  n  {||A||  <  K)  /  1^(0,)  n  {||A||  <  K)  for  each  07  G  50,, 

so  that  by  continuity  of  A  i— >  £oo(#7,  A)  at  each  0/  G  30/ 

(C.62)  M(K)  \  A4    a.s. 

The  details  of  proving  (C.62)  are  nearly  identical  to  those  given  in  (C.31)-(C36),  so  are  not  repeated. 
Let  F  be  any  real  number  such  that  P{M{K)  =  F}  =  0,  then 

P{M(K)  <  F}  =    lim  P.{Mn(K)  <  F} 

n— *oo 
(2) 

>  \im  sup  P'{Mn{K)  <  F} 

n—KX) 
(3) 

>  liminfP,{Xn(^)  <F} 
(C.63)  n-°° 

(4)  

>  \iminf  P.{Mn(K)  <  F} 

n—*oo 

(5)         

>  P{M(K)  <  F}  for  any  n0  >  1 
^     P{M[K)<F), 

nD— *oo 

where  (1)  is  by  the  Portmanteau  lemma  and  (C.60),  (2)-(4)  by  (C.58),  (5)  by  the  Portmanteau  lemma,  and 
(6)  is  by  (C.62)  and  the  Portmanteau  lemma.  Therefore,  for  any  real  F  such  that  P{M{K)  =  F}  =  0 

(C.64)  limsup.P*{.M„(iO  <  F}  =  liminf  P„{Mn[K)  <  F}  =  P{M{K)  <  F}. 

n— oo  n-»oo 

Hence  by  the  Portmanteau  lemma 

(C.65)  Mn(K)  ->d   M{K). 

U 

Appendix  D.  Proof  of  Theorem  3.2 

Steps  I  and  II  prove  Claim  I  and  Step  III  proves  Claim  II. 
Step  I:  Case  when  nqn  — »p  0.  Recall  that  Q(07)  =  0  and 

(D.l)  ln(Bi,  A)  :=  n{Qn{6,  +  X/y/n)  -  Q(0,)). 

Since  nqn  — *p  0,  we  have  that 

(D.2)  Cn=   sup  M0/,O)  +  op(l). 

e;ee, 

From  the  Proof  of  Theorem  3.1  we  have  that 

(D.3)  sup  *„(0/,O)  =  Op.(l). 
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Hence  since  k  — >p  oo 

(D.4)  Cn  <  k  wp  ->  1   =>   6/  C  Cn('k)    wp  —  1     =>  /i(0/,Cn(fc))  =0   wp  -+  1. 

Condition  C.3  implies  that  for  any  K  — >  oo  with  i<"/fc  — >  0  there  are  positive  constants  C\  and  5  such  that 
wp  — >  1 

(D.5)  Ci  ■  n  •  min[i/(0/,  A)2,  <52]  <  4(#/,  A) 

uniformly  in  (0/,  A)  such  that  v(9[,  A)  >  K/y/n,  where  v(6i,  A)  =  infS'  ee,  ||#/  +  X/y/n  —  8'j\\. 
By  definition  of  Cn(fc)  and  since  Q(6i)  =  0 

(D.6)  sup    nQn(9)  +  op{l)  =  sup  4(0/ ,  A)  +  op(l)  <  k. 

eec„(k)  e,+\/y/necn(k) 

Hence 

(D.7)  9,  +  X/y/n  £  Cn(k)  implies  £n{6ItX)  <k  +  op(l). 

Then  the  claim  is  that  wp  — >  1 

(D.s)  cn(k)  c  ef/c')i/2/^, 

where  0;  :=  {9j  + 1  :  \\t\\  <  c,9j  €  ©/}.  Suppose  otherwise,  then  for  some  #/  +  X/y/n  6  Cn(k), 
(D.9)  v{0i,X)  =    inf    \0i  +  X/yfr  -  8'A  >  2(fc/Ci)1/2/^. 

0/66/ 

Then  for  this  pair  (5/,  A),  wp  — >  1 

(D.10)  4fc  <  d  ■n-min[^(e/,A)2,<52]  <  4(0/,  A)  <  fc  +  op(l), 

where  (a)  is  by  (D.9)  and  by  fc  — >  oo  and  k/n  — >  0  so  that  Cj  ■n.-min[v(^/,  A)2,  <52]  >  Ci  ■n'min[4(A;/C1)/n,  J2]  > 
4/c  wp  — >  1,  (b)  is  by  (D.5),  (c)  is  by  (D.7).  This  yields  a  contradiction.  Thus,  (D.8)  is  true. 
Combining  (D.4)  and  (D.S)  it  follows  that  wp  — »  1 

(D.ll)  dH(Qi,  Cn(k))  <  h(Cn('k),  8/)  <  /i(02("/Cl)'/2/^,  9/)  <  2('k/C1)1'2/^. 

Step  II:  Case  when  nqn  -/-*p  0.  Observe  the  relationship  between  the  "centered"  and  "un-centered"  confi- 
dence regions.  Let  the  un-centered  region  be  denoted  as  before: 

(D.12)  Cn{c):=[8:n{Qn{9))<c], 

and  the  centered  version  be  denoted  as  (in  this  proof  only): 

(D.13)  Cn{c):=  \8:n(Qn(9)-qn)<c\,  where  qn  :=  inf  Qn{9). 

Observe  that  the  following  key  relationship  between  the  two  sets 

(D.14)  Cn{k)  =  Cn{k  +  nqn)  for  all  k  >  0. 

Let  k  — >  oo  but  k  =  op(n).  From  the  proof  of  Theorem  2.1  we  know  that 

(D.15)  M„  =  nqn  =  Op(l)  since  Mn  — ></   M. 

Hence 

(D.16)  fc:=(fc  +  ng„)  =  fc-(l  +  op(l)). 
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By  (D.14) 

(D.17)  Cn{k)  =  Cn(k),     so    dH{Cn{k),QI)  =  dH{Cn(k)tQI). 

Hence  by  (D.ll)  and  (D.16)  it  follows  that  wp  — >  1 

(D.18)  dH{Cn{k),Q,)  <  dH(Cn(k),  0;)  <  2(k/C1)1/2/V^  <  (2(fc/C1)1/2/v/^)  (1  +  °p(l)), 

and  the  claim  now  follows. 

Step  III  proves  Claim  II,  that  for  k  G  [c0,Ci]  ■  Inn,  wp  — >  1 

(D.19)  6/  C  Cn{'k)  C  07",  where     b{  sup  Qh{9)  -  sup  Qb{0))  =  op(l), 

see',"  see, 

0jn  =  {0/  +  t  :  \\t\\  <  en,  6;  6  0/},  and  tn  — >  0  is  a  sequence  of  positive  constants. 
We  have  that  k  €  [cq,  c\]  ■  Inn     wp  — >  1.  Hence,  by  Claim  I 

(D.20)  0/  C  Cn(k)  C  Q)n       wp   -»  1,  where  en  =  C  ■  In n/^/n 

for  some  C  >  0.  Then,  using  Q(0/)  =  0, 

0  <  b(  sup   Qb{9)  -   sup  Qfc(0/)) 
eee)n  e,ee, 

sup        &(Q6(0/+A/V6-Q(0/))-  sup  b(Qb(8,))  -  Q{e,)) 
Bi+x/Vtee]"  e,ee, 

<  sup  4  (0j,  A)-   sup  4(0/,  0) 

e,ee,,|A|<\/6£„  0,ee, 

=  max[  sup  4(6/,  A),  sup  4(6/,  0)]  -   sup  4(0/, 0) 

(D.21)  e,ese,,|A|<v/&£„  0,ee,  e,ee, 

=  max[      sup    4(0/,.O)  +  op(l),  sup  4(0/, 0)]   -   sup  4(0/, 0) 
e,eae,  0,ee,  8,eet 

=  max[      sup    4(0/,  0),    sup  4(0/,  0)]  +  op(l)  -   sup  4(0/,  0) 
s,eae,  s,ee,  s,€9, 

=   sup  4(0/,O)  +  op(l)-   sup  4(0/,  0) 
B,ee,  B,ee, 

=  °p(1), 

where  (a)  follows  from  the  uniform  stochastic  equicontinuity  of  the  process  A  >— >  supe/£SQ/  4(0/,  A)  over  K 
assumed  in  C.2  and  from 

(D.22)  Vben  ->  0, 

which  follows  from  the  assumption  that 

(D.23)  b/n  — ►  0     at  polynomial  rate,  so  that     vben  oc  \/b/n\nn  — >  0; 

(b)  follows  by  the  Continuous  Mapping  Theorem  and  proof  of  Theorem  3.1  which  implies  that  as  b  — >  oo 

(D.24)  (  sup  4(0/,  0),    sup    4(0j,O))->d    (  sup  4o(0/,A),    sup    4o(0/,O)Y 

V9|69,  B,€d&,  '  Vfl;£3e,,AeA„(9,)  8/696,  ' 

D 
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Appendix  E.  Proof  of  Theorem  3.3 

Verification  of  A.l  is  immediate  from  the  stated  assumptions.  Therefore  we  will  focus  on  verifying  CI  to 
C.3. 

Step  I.  Verification  of  C.l:  The  identified  set  0;  is  determined  by  a  set  of  inequalities:  0,  :=  {0  £  Rd  : 
x'jO  >  T\  (xj)  and  x'0  <  r2  (xj)  for  all  j  <  J}.  It  is  assumed  that  the  d  x  J  matrix  X  :=  (xj,j  <  J)  is  of  rank 
d  =  dim(0).  Denote  Tk  :=  (Tk{xj),j  <  J)  for  k  =  1,2.  Thus 

(E.l)  6/  =  {0  £  Rd  :  X'6  =  t  for  some  t:Ti<t<  T2}. 

Since  T  =  {t  G  RJ  :  T\  <  t  <  7^}  is  compact  and  X  has  full  rank,  0/  is  compact.  It  is  assumed  that  0/  C  0. 
(90/  is  determined  as 

(E.2)  90,  =  {0  e  0/  :  x'jO  =  n  (x^  or  x'jd  =  r2  (a;,-) ,  for  some  j}. 

That  0,  G  In{8)  =  {Oi  G  int(0;)  :  infe'eaej  ||0,  —  8'j\\  >  S/y/n}  implies  that  x^0,  must  be  bounded  away 
from  any  rk  [xj)  by  the  distance  proportional  to  \\xj  \\5/^/n.  Indeed,  observe  that  there  exists  k  >  0  such  that 

(e.3)  Pi^rfrii^K>0  Wi4deI,wIedeI:x'ie'I=Tk(xi),  v(j,fc). 

Suppose  otherwise  that  k  =  0.  This  implies  x'-0,  =  x'.B'j  =  Tk{xj)  for  some  j  and  k  and  some  Q\  G  90/.  This 
poses  a  contradiction  to  0/  £  90/ .  Hence  k  >  0.  Hence 

(E.4)  l|T*(sj)-*jftll  >  K||e,  _  ^u    W/  ^  a6/)  v^  e  qq,  :  x%  =  Tk(Xj),    V(j,  fc). 

\\xj\\ 
Recall  that  ||xj||  >  1  for  all  j  since  the  first  component  of  Xj  is  1.  Hence 

(E.5)  IM^O-x^jH^*     inf     H0',- 0/|| Hz, ||      V0,  £90,,    V(j,fc). 


Thus, 
(E.6) 
Recall 


Qieln(S)      =>      n  (x.,)  -  x^-0,  <  --^=||x7||  orT2(xj)-40,  >  -^=11^11,    Vj. 


(E.7)  4  (0/,  A)  =  £  ^  [^  (n  (x,)  -  x;-0,)  -  xjA]  +  +  E  ^  [v^  (f2  (*i)  -  *}*/)  -  *J 


2         „  „.,  r     _  .  -]2 

A 


j=l    ""  +       j  =  l 


Hence 


(E.8)  *n  (0,,  0)  =  J2  J  [W„  -  v^  (^0/  -  r,  (x3))  ]  2+  +  E  ^  [^W  -  ^  K^  "  T2  (*i))  ] '  ■ 


j=i  j=i 


Hence 


(E.9)  sup     \en  (0/,  0)  |  <  E  -  f^w  -  ^||x,  ||] 2  +  E  -  [^'  +  «*ll^  111 2  ■ 

e,er„(6)  JTi  n  L  J+      Jit  n 

Choose  any  e  >  0.  Since  ||xj||  >  1,  Wij  =  Op  (1)  and  W2j  =  Op  (1)  for  each  j  <  J,  for  5  sufficiently  large, 

(E.10)  limsupmaxP  (Wij  >  k;5||x.,||)  <  e     and     maxlimsupP  {W2j  <  —  k<5||xj||)  <  e. 
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C.l  is  now  verified,  since  it  follows  that  for  any  e  >  0,  there  exists  &  sufficiently  large  such  that 
(E.ll)  sup     |4(<?/,  0)|  =  0wp   >  1-e. 

Step  II.  Verification  of  part  i  of  C.2:  Write  4  (0/,  A)  =  £nl)  (9/,  A)  +  42)  (9,,  A) ,  where 

.    #>  (9,,a)  =  ^ [w„  - x;a]2  i  foe!  =  Ti  fo)) 

3=1  + 

(E.12) 

/l     L  J  -f- 

3  =  1 

J  2 

+  E  J  [™V  -  4A  -  >/"  (xi°'  -  T*  (x3))  ]    1  {x'jOl  <  ^  fa))  ■ 

3  =  1 

Note  first  that  for  fixed  8j  G  56/  and  fixed  A,  by  an  argument  similar  to  that  in  Step  I, 

(E.13)  42>(07,A)  =  Owp   -1. 

Therefore,  by  the  Continuous  Mapping  Theorem  the  finite-dimensional  limit  of  4  (#/,  A)  is  given  by 

J  2  J  2 

(E.14)  4o  (9j,  A)  =  Y,Vj  [Wij  -  x'jXJ    1  (x']9l  =  n  (a:,))  +  £p,  [w2j-  -  !bJa]_1  (xjfl/  =  r2  {x2))  . 

3=1  +  3=1 

Step  III.  Verification  of  part  ii(a)  of  C.2:  First,  suppose  that  0/  ^  <90/,  that  is  the  interior  of  0/  is  nonempty 
relative  to  Rd.  C.2-ii(a)  requires  us  to  examine  the  finite-dimensional  limit  theory  of  supfl;gSe;  4  (9j,  •).  The 
problem  of  finding  supe;gge;  4  (9j,  A)  for  n  <  oo  amounts  to  choosing  9j  among  the  finite  subset  of  points 
Vj  C  0/  that  are  defined  as  a  collection  of  solutions  to  all  systems  of  rf-equations  of  the  form: 

(E.15)  x'jieI=Tkl(xjl),     l  =  l,...,d, 

such  that  (xjn  I  =  1,  ...,d)  has  rank  d  and  (fc/,j?i)  G  {1,2}  x  {1,2, ...,  J}  for  each  I.  There  is  only  a  finite  set  of 
such  systems  of  equations.  Then, 

(E.16)  sup    4(6>7, A)  =  max  4(0,,  A),  for  n  <  oo. 

Hence  by  the  Continuous  Mapping  Theorem  the  finite-dimensional  weak  limit  of  ma-xg^v,  4  ($/.  •)  is  given 
by  maxe,ey,  4o  (9j,  ■),  and  therefore  the  finite-dimensional  weak  limit  of  supe/eSQf  4  (0/,  ■)  is  given  by 
suPe,eae,^oo(5/,-)- 

Second,  suppose  that  0/  =  90/,  that  is  the  interior  of  0/  empty  relative  to  Rd.  Then  wp  — >  1 

inf    4  {0It  A)  =  £l(7i  (Xj)  =  r2  (*,))  S.  (wl3  -  x'xY 

(E.17)  j~J 

inf   *«,  (07,  A)  =  £  1  (n  fo)  =  r2  (x,-))ft  (^13  -  ^A)2  . 

'      '  J<J 

Therefore,  by  Continuous  Mapping  Theorem  the  finite-dimensional  limit  of  infg,ee,  4  (9j,  ■)  is  given  by 
infa;ee;  too  (9j,  ■).    The  joint  finite-dimensional  convergence  of  (inf^ge,  4  (0/,  •)  ,  sup9;£Q,  4  (9i,  •))  follows 

37 


by  the  Continuous  Mapping  Theorem. 

Step  III.  Verification  of  part  ii(b)  of  C.2:  Let  K  be  any  compact  subset  of  Rd.  To  verify  the  stochastic  equicon- 
tinuity  for  max8,ev,  £n  (9j,  A)  over  A  £  K,  where  Vj  is  a  finite  subset  of  99/,  it  suffices  to  show  that  for  any 
fixed  9 1  £  90/,  (n  (#/.  A)  is  stochastically  equicontinuous  in  A  £  K.  This  immediately  follows  from  convexity  of 
(■n  ($/i ')  in  A  and  finite-dimensional  convergence  of  ln  (9j,  ■)  to  a  convex  continuous  function  ^  (9j,  •)  over  Kd, 
proven  in  Step  I.  Likewise,  stochastic  equicontinuity  of  infg;£e;  in  (#/,  A)  in  A  £  K  also  follows  from  convexity 
and  finite-dimensional  convergence  of  infs,ee,  t-n  {Si,  •)  to  a  convex  continuous  function  infe,ee;  i^  (9i,  •),  cf. 
(E.17). 

Step  IV.  Verification  of  part  iii  of  C.2:  Any  A  £  A^  (#/)  for  9[  £  90/  satisfies  the  property  that  a:' A  >  0 
whenever  Tj  (xj)  =  i^/,  and  x^-A  <  0  whenever  r2  (x^)  =  x'ftj.  Hence 

J 

(E.18)  sup  l^  (<?/,  A)  <  YV-  ([W^l  +  \W2j)l)  <  oo  a.s. 

e,ede,,xeA„(e1)  j=1 

Step  V.  Verification  of  C.3:  For  each  9n  (#/,  A)  such  that  v{9j,X)  =  infe<ge,  \\6i  +  X/y/n  —  9'j\\  >  K/y/n, 
decompose 

(E.19)       9,  +  \/y/n  =  9}  +  X'/y/n,  where  9j  =  arg    inf    \\9,  +  X/y/n  -  9\\  and  ||A*||  =  y/nv  (6»7,  A)  >  K. 

8£dQ  i 

Any  solution  9}  is  subject  to  1  <  p  <  dim(9)  binding  constraints  of  the  form: 

(E.20)  *},*/=**,(**).    j  =  i,...,p, 

where  A"  :=  (xJ(  J  =  l,...,p)  has  rank  p  and  (fc(,j()  £  i1*2}  x  {!>•■•> -f}  for  each  I.  Matrix  X*' X*  (which 
depends  on  (#/,  A))  is  necessarily  one  of  the  finitely  many  p  x  p  sub-matrices  of  X*' X*  that  have  full  rank  p, 
and  whose  eigenvalues  are  bounded  above  away  from  zero.  Let  also 

(E.21)  T*  =  (rk,  (Xjl),  I  =  1,  ...,'p)  and  JK'  =  ((ji,h),l  =  1,  ...,p). 

Since  the  eigenvalues  of  (X*1  X*)       are  bounded  away  from  zero, 

(E.22)  Cl||A*||  <  IK'A.l.. 

Moreover,  given  any  index  set  JK.* ,  for  aZi  {ji,k[)  £  J7X": 

(E.23)  Xj,A*<0     if    fc,  =  1      or      x'jtX*  >  0     if    /c,  =  2. 

By  (E.22)  ai  least  for  one  {j',k')  £  J<C: 

(E.24)  Ci||A*||  <  \x'j.X*\     so  that  x'}. A*  <  -Ci || A* ||     if    fc*  =  1      or      x'}. A*  >  Ci||A*||     if    fc*  =  2. 
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Decompose  in  (0/,  A)  =  ^  (<?/,  A)  +  i^  (Qi,  A) ,  where 
(E.25) 


//,     L  J  -^  /£     L.  J  - — 

7=1 

>—  [Wii--Vn(^-.A*)]    l(fc*  =  l)+[Vn(^.A*)-W2j-]    1  (fe*  =  2) 

>Pj-(l  +  0P(1))[WV  +  Cl||A*||]2  1  (**  =  1)  +  [Cl||A*||  -  W2j-]2  1  (*•  =  2) 

>pJ.(l  +  0p(l))[min[vVu.,-Vy2j.]+c1||A*||]2 

>minpj(l+Op(l))[min[WiJ-,-W'2i,.j<  J]  +  Cl||A*||l    , 
i<j  I  J+ 


and 


V 


£(*,,  A) .=  J2 % [Wu  -  v^*  -  nfe))  - ^A*)]2  i(x'tf  >  n(Xj)) 

3=1 

J  2 

J  =  l 

Hence,  recalling  that  ||A*||  =  y/nu(9i,  X),  one  has  that  tn  (#/,A)  >  c(]y  +  c'||A*||)+,  for  some  positive 
constants  c  >  0  and  c'  >  0,  where  W_  :=  min  UVij,—  W2j,j  <  ■/  •  Let  y/nv(9j,X)  =  ||A*||  >  A".  Since 
\W\  =  Op(l),  for  any  e  >  0,  we  can  select  K  large  enough  so  that  W  >  —c'y/rlv(9j,  A)/2  wp  >  1  —  e.  Then 
wp  >  1  —  e, 

(e.27)  e^(eI,x)>1-cc'v(e1,xf. 

Since  A>(0/,  A)  >  #]  (0/,  A),  C.3  is  verified.  □ 

Appendix  F.  Proof  of  Theorem  3.4 

Step  I.  Verification  of  A.l  is  immediate  from  the  stated  assumptions  and  from  {mt(9),9  e  0}  being 
Donsker  (hence  Glivenko-Cantelli). 

Step  II.  Verification  of  C.l  is  not  needed  because  0/  =  90/  in  0. 
Step  III.  Verification  of  C.2  Write 

en(6r,X)  =  ^mifff,  +  X/y/n)'Wn{0,  +  X/y/n)\/nEnmi{eI  +  X/y/n) 

(F.l)  =  (Gnrmtfi  +  A/Vn)  +  y/^Emi(9,  +  X/y/n))' 

x  Wn{8,  +  X/y/n)  (Gnm,{9,  +  X/y/n)  +  y/n'Em^e,  +  X/y/n)) . 

By  condition  ii.  that  {rrii(9),9  €  0}  forms  a  Donsker  class  for  any  compact  set  K, 

(F.2)  Gnmi(0/  +  X/y/n)  =  Gnmj(0/)  +  op(l)  =>  A(07),  in  L~(9/  x  K) 

where  A(6>/)  is  the  Gaussian  process  with  the  covariance  functions  given  in  the  statement  of  Theorem  2.4.  By 
condition  iv., 

(F.3)  Wn(6,  +  X/y/n)  =  W{8,)  +  op(l),  in  ^(0;  x  K). 
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By  condition  iii., 

(F.4)  y/nEml(&1  +  X/y/n)  =  G(0j)'A  +  o(l),  in  L°°(0/  x  K). 

Hence 

(F.5)  tn{6i,X)  =*■  *oo(0/,  A)  =  (A(07)  4-  G(0j),'A)'  x  W(0j)  x  (A(0j)  +  G(07)'A) ,  in  L°°(07  x  AT). 

Hence  C.2-i.  is  verified,  since  finite-dimensional  convergence  is  implied  by  weak  convergence  in  L°°(QI  x  K). 
C.2-ii.(a)  is  verified  by  the  Continuous  Mapping  Theorem  since  the  functionals  ln(\)  =  mfg/6e;  £n(8j,  A) 
and  un(A)  =  supe(66j  £n(9i,  A)  are  continuous  transformations  of  £n{9,  A).  C.2-ii.(b)  follows  as  well,  since 
stochastic  equicontinuity  of /n(A)  and  un(A)  is  implied  by  stochastic  equicontinuity  of  £n(9j,  A)  over  0/  x  K, 
which  is  implied  by  weak  convergence  of  £n(9i,  A)  in  L°°  (0/  x  K)  to  the  quadratic  form  of  a  Gaussian  process, 
4o(0/,A). 

Step  IV.  Verification  of  C.3.  By  condition  ii.  m,i(Q)  forms  a  Donsker  class.  Hence 

(F.6)  Gnml{9)  =>  A(<?)  in  L°°(0), 

where  A(#)  is  the  Gaussian  process  with  the  covariance  functions  given  in  the  statement  of  Theorem  2.4. 
Hence 

(F.7)  sup  ||Gnmi(0)||  =  0,(1). 

see 

By  condition  iv.,  uniformly  in  9  €  0  Wn(9)  =  W(9)+op(l).  Define  £n  =  infgse  mineig  (Wn(9)).  By  condition 
iv.  £n  — >p   f  >  0.  Hence  wp  — >  1 


Anf...  _  (  „,..,2  ^ :  ^    ^  ..  .inf...  ^» 


v(b,,\)>k/s/E  \n(v(9lt  A)2  A52)J      vtf,,\)>K/-fn  n{v{9i,X)2  A<52) 

(F.8)  >  inf         f/2||V^"H(»/  +  Vv'n)   ,  Op(1) 


v(e,,\)>K/yfc       II   y/n(i;(0/,A)2A<52)        v/n(w(6'/,A)2A52)ll 
>  inf  r/2|lv^£^,(g/  +  A/v^)  Op(l)  II- 

_  „(9„A)>K:/ys       II    y/n^/.A)2  A(52)        y/n(v{6,,  A)2  A  52)  H< 
Note  the  line  is  different  from  the  preceding  one,  as  it  uses  the  sup  norm  instead  of  the  Euclidean  norm. 
By  the  partial  identification  condition  (3.15) 

(F9)  \\y/nEmj(9,  +  A/y/w)  II  ^ 

II   ^71(^(61/,  A)2  A  52)   Hoc 
for  some  constant  C  >  0.  For  a  given  e  >  0,  K  can  be  made  arbitrarily  large,  so  that  wp  >  1  —  e 

0P(1) 


(F.10)  ~pv  / <c/2 

IVn(i;(e/,A)2A52)lloc 

Hence  wp  >  1  —  e, 

(F.ll)  inf  (       !"{e!;2Xlm)>C'  =  Z/2(C/2)2 

v(9,,x)>K/%/n\n(v(9I,X)2  Ad2)  J 

Hence  wp  >  1  —  e, 

(F.12)  en{9i,X)  >C  ■n-{v(6{,X)2  AS2)     for  all     v{9,,X)>  K/^. 

Hence  C.3  is  verified.  □ 
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Appendix  G.  Relationship  to  Pointwise  Inference 

Suppose  that  one  is  interested  in  a  special  parameter  9"  inside  6/.  The  inference  about  some  9*  in  0/  is 
well  motivated,  when  there  is  a  sense  in  which  9*  is  the  "truth" .  This  case  typically  arises  in  non-structural 
analysis  or  when  it  is  believed  that  the  models  are  correct  representations  of  data-generating  processes  for 
some  parameter  value,  i.e.  there  is  9*  G  Rd  such  that  the  model  law  Pg  agrees  with  the  actual  stochastic  law 
of  data  P.  In  this  scenario,  0/  is  not  of  interest  per  se,  but  rather  9*  is.  In  method  of  moments  settings,  a 
similar  problem  has  been  already  investigated  in  the  context  of  dynamic  model  with  censoring  by  Hu  (2002), 
IV  quantile  estimation  by  Chernozhukov  and  Hansen  (2003),  and  GMM  weak  or  complete  unidentification  by 
Kleibergen  (2002).  Imbens  and  Manski  (2004)  investigate  the  Wald  type  inference  about  9"  for  the  special 
case  where  a  real  parameter  of  interest  is  known  to  lie  in  an  interval  with  endpoints  that  can  be  consistently 
estimated.  Here  we  provide  further  insights  concerning  the  pointwise  inference  in  its  relation  to  regionwise 
inference. 

Assumption  A. 4.  Suppose  there  exists  an  — >  co  such  that 

an(Qn(9i)  ~  Qn)  -d   C(9r)    for  all  9,  e  0, 

where  C(9i)  is  a  random  variable.  Moreover,  for  at  least  one  9j  £  ©/,  C(9j)  >  0  with  positive  probability  and 
has  continuous  distribution  function  on  (0,  co);  otherwise,  C(9j)  =  0  with  probability  one. 

Theorem  G.l.  Suppose  that  Assumptions  A.l  and  A. 4  hold.  Let  c*  =  sup9;  ca(9[).  Then  for  any  9*  in  Qj 
liminfn_00p{r  eCn{c*a)}  >a. 

Proof: 

liminf  P{9'  6  Cn(c*a)}  =  liminf  P{an  {Qn{9')  -  qn)  <  sup ca{9)} 

n — >oo  n— *oo  a 

>  liminf  P{an  (Qn(9t)  -  qn)  <  ca(9')}  >  (1  or  a)  >  a, 

n — >oo 

D 
The  theorem  above  is  constructed  using  the  Anderson-Rubin  pointwise  testing  principle.  Notice  that  the 
confidence  set  constructed  in  the  theorem  above  is  necessarily  smaller  than  the  regionwise  confidence  set  in 
Theorem  2.1.  Notice  also  that  as  a  byproduct  of  our  derivation,  inequality  (1)  in  (G.3)  shows  that  the  set 

(G.2)  Cn(ca(-))  =  {9*  e  6/  :  an  (Qn(9*)  -  qn)  <  ca(5*)} 

also  has  the  a  pointwise  coverage.  Hu  (2001)  proposed  the  set  (G.2)  in  the  context  of  a  partially  identified 
dynamic  censored  regression  model.  Chernozhukov  and  Hansen  (2003)  also  use  this  technique  for  IV  quantiles. 
Manski  and  Imbens  (2004)  construct  this  set  for  the  case  of  interval-identified  parameter.13  The  set  Cn(ca(-)) 
is  generally  not  equal  (is  smaller)  than  the  level  set  Cn(c*)  of  the  function  anQn(9).  However,  this  set 
is  generally  a  special  case  of  our  construction.  This  can  be  seen  by  defining  the  new  objective  function 
Qn{9)  :=  Qn(9)/max[ca{9),t}  for  all  9  6  07.14 

Further  let  ca(9)  be  the  subsampling  estimate  of  ca(9)  for  each  9  £  0; . 


13A  more  recent  paper  than  ours,  by  Ho,  Pakes  and  Porter  (2004),  has  also  proposed  such  sets  in  the  context  of 
moment  inequalities.  The  difference  is  that  they  do  not  directly  work  with  objective  functions. 

14This  is  an  equi-quantile  transformation  of  the  original  objective  function.  In  many  examples  this  is  unnecessary, 
as  objective  functions  have  the  equi-quantile  property  by  using  optimal  weights. 
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Theorem  G.2.  Suppose  that  Assumptions  A.l  and  A. 4  hold.  Let  c*  =  supS/  ca(6j).   Then  for  any  6*  in  Q; 
liminfn^oo  P<  0*  G  Cn(c^)  >  >  a.  (Likewise,  a  corollary  is  that  C"n(c*  (•))  also  covers  6*  with  probability  a.) 

Proof:     Since  6/  C  6/  wp  — *  1,  it  follows  that 

liminf  P{6*  E  Cn{c'a)}  =  liminf  P{an  (Qn(0*)  -  qn)  <  supcQ(0)} 

n—*oo  n—*co  q 

>  liminf P{an(Qn(e,)-qn)<ca{6')} 
(G.3)  n^°° 

(2) 

>  liminf  P{an  (Qn(8*)  -  qn)  <  ca(e')  +  op(l)} 

n— >oo 

>  (1  or  a)  >  q, 

Equality  (2)  follows  from  the  standard  argument  for  subsampling,  e.g.   as  the  one  presented  in  Step  2  of  the 
proof  of  Theorem  2.1.   (  Inequality  (1)  shows  that  Cn(c^(-))  also  covers  9*  with  probability  a.  ). 


□ 


[CONSISTENCY  AND  RATES  HERE  TOO?] 


Notation  and  Terms 

— >p  convergence  in  (outer)  probability  P* 

— >d  convergence  in  distribution  under  P* 

wp  — +  1  with  inner  probability  P,  converging  to  one 

wp  >  1  —  e  with  inner  probability  P,  larger  than  1  —  e  for  sufficiently  large  n, 

Bs(x)  closed  ball  centered  at  x  of  radius  5  >  0 

/  identity  matrix 

Af(0,  a)  normal  random  vector  with  mean  0  and  variance  matrix  a 

T  Donsker  class  here  this  means  that  empirical  process  /  t— >  -4=  YM=i(fiWi)  ~  Ef(Wi))  is 
asymptotically  Gaussian  in  L°°(T),  see  Vaart  (1999) 

L00(Jr)  metric  space  of  bounded  over  T  functions,  see  Vaart  (1999) 

mineig(A)  minimum  eigenvalue  of  matrix  A 
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